UPC Bar Codes - a mini research project by Katherine Vo and Tyler Dauwalder - Team UPC

Introduction:

Although we don't give them much special attention, UPC bar codes are around us all the time. You will find that just about every package you see has a UPC bar code printed on it. In fact, every item that you purchase from department stores, grocery stores, etc. has a UPC bar code on it somewhere. But have you ever wondered where these codes come from and what they mean? In this mini-project, we will answer this question by explaining how the string 'CSE370' could be represented in a UPC bar code.


What's in a bar code?

Bar codes are machine-readable symbols made of patterns of black and white bars, or in some cases checkerboard-like grids. There are different styles of bar codes called symbologies. UPC is one of many examples of different symbologies.

UPC stands for Universal Product Code. Formally adopted by the grocery industry on April 3, 1973 as the standard bar code symbology for product marking, the UPC was one of the first bar code symbologies to gain wide acceptance.


How do UPC bar codes work?

UPC bar codes are a machine-scannable encoding of 10-digit decimal numbers preceded by a 1-digit decimal number "number system character" and followed by a 1-digit decimal number checksum. The "number system character" specifies the type of bar code as one of the following:

0: standard UPC number
1: (reserved)
2: items of varying weight, such as produce
3: pharmaceuticals
4: custom (in-house) UPC number
5: coupons
6: standard UPC number
7: standard UPC number
8: (reserved)
9: (reserved)

The checksum digit is used to help verify that the code was properly scanned. Taking the first 11 digits of the code (the 1-digit "number system character" followed by the 10-digit item code), the checksum is computed with the following algorithm:

code = 11-digit code
odds = sum of odd digits in code (counting from 1)
evens = sum of even digits in code (counting from 1)
total = (3 * odds) + evens
checksum = 10 - (total mod 10)

The bar code itself is composed of alternating black and white bars of varying size. The bars used to create the bar code come in four relative sizes, the largest being four times as wide as the smallest. Each digit in the 12-digit number is encoded with a unique four-bar sequence of total width 7:

0:  3 - 2 - 1 - 1
1:  2 - 2 - 2 - 1
2:  2 - 1 - 2 - 2
3:  1 - 4 - 1 - 1
4:  1 - 1 - 3 - 2
5:  1 - 2 - 3 - 1
6:  1 - 1 - 1 - 4
7:  1 - 3 - 1 - 2
8:  1 - 2 - 1 - 3
9:  3 - 1 - 1 - 2

The UPC bar code begins at the left with a "1 - 1 - 1" start code (black - white - black). Following the start code are the first 6 digits of the code, each composed of four bars (white - black - white - black) as defined by the table above. In the middle of the code is a "1 - 1 - 1 - 1 - 1" midpoint code (white - black - white - black - white). Then come the remaining 6 digits of the code, each also composed of four bars (this time black - white - black - white). The end of the bar code is then signalled with a "1 - 1 - 1" stop code (black - white - black). A well-formed UPC bar code might look something like:

Example UPC bar code


Encoding 'CSE370' in a UPC bar code:

The largest 10-digit number our bar code can store is 9,999,999,99910. That number requires 34-bits to store in binary form (10010101000000101111100011111111112). Therefore, any number that fits into 33 bits or less can be encoded with a UPC bar code.

To encode our class description string 'CSE370', we'll use an encoding scheme intended to be general enough to describe any class offered at the University of Washington. We'll encode our string in three sections::

1.
Departmental Prefix: i.e. 'CSE', 'PHYS', 'CHEM'
These four alphabetic characters are encoded as four 5-bit values defined as:

00000  = space
00001  = (undefined)
   ...
00101  = (undefined)
00110  = A
00111  = B
01000  = C
   ...
11101  = X
11110  = Y
11111  = Z

For a prefix with length < 4, the alphabetic characters are to be left aligned, padded to the right with spaces, i.e. 'CSE '.

   
2.
Class Number: i.e. '370', '540', '142'
The class number is encoded as a 10-bit unsigned integer. This allows for numbers x s.t. 999 >= x >= 000.

   
3.
Leftover bits
We have three of our 33 bits left unused, so we'll just put them in the three least significant spaces (i.e. following the other two sections). For no particularly good reason, we'll require these three bits to always be '101'.


Finally, we line up all our bits into one big 33-bit binary integer and convert it to decimal. If the resulting decimal number is of length < 10, we right-align our number in 10 digits and pad to the left with zeros (i.e. '12345' >> '0000012345'). That number is then encoded into bar code.

Following this encoding scheme, our string 'CSE370' would be encoded into a 10-digit decimal number like this:

C S E   370  
01000 11000 01010 00000 0101110010 101
  >>  
2351434645

Since we're using a custom encoding that no one else is going to know how to interpret, our "number system character" has to be a 4. Thus our 11-digit code is:

42351434645

Next, we have to compute our checksum digit. Following the algorithm outlined above:

code = 42351434645
odds = 4 + 3 + 1 + 3 + 6 + 5 = 22
evens = 2 + 5 + 4 + 4 + 4 = 19
total = (3 * 22) + 19 = 85
checksum = 10 - (85 mod 10) = 5

Thus, our checksum digits is 5, and our final 12-digit code is:

423514346455

Now, following the table listed above, we encode our twelve digit number into bar codes:

(start) 4 2 3 5 1 4 (middle) 3 4 6 4 5 5 (stop)
1 1 1 1 1 3 2 2 1 2 2 1 4 1 1 1 2 3 1 2 2 2 1 1 1 3 2 1 1 1 1 1 1 4 1 1 1 1 3 2 1 1 1 4 1 1 3 2 1 2 3 1 1 2 3 1 1 1 1

And finally, after putting together all the bars, we have:

CSE 370
'CSE 370' bar code

References: