CSE370 Mini Research Project

Physical Level Representation of String CSE370 on a CD-ROM

Marko Medenica

Aaron Day

Natalia Burina

Preliminaries

To understand the representations of strings on a CD-ROM we should first remind ourselves how strings are represented in a machine. A string is an array of type char or wchar for UNICODE. Most computers today use 8-bit bytes to represent characters with ASCII integer codes. For example, in ASCII (American Standard Code for Information Interchanges), the code for 'A' is 65 (decimal). Furthermore, since characters are combined into strings, there are usually three choices for representing a string:

the first position of the string is reserved to give the length of a string
an accompanying variable has the length of the string (as in a structure) or
the last position of a string is indicated by a character used to mark the end of a string.

We assume the third representation (as in C) where we have a string with the last byte whose value is 0 (null in ASCII). Below we show ASCII integer codes for our target string 'CSE370' along with their binary number values. The 14-bit representation is EFM (Eight-to-Fourteen Modulation) used on CD-ROM.

Character	decimal value of ASCII character	binary value of ASCII character	14-bit Representation (EFM)
C	67	01000011	10001000100100
S	83	01010011	00100000100100
E	69	01000101	00000000100100
3	51	00110011	10000100010000
7	55	00110111	00100010001000
0	48	00110000	01001001000100
null	0	00000000	01001000100000

Physical Makeup

A CD-ROM stores information encoded in a plastic-encased spiral track. This information is read optically by a non-contact head while the disk spins above it. To keep a constant linear velocity, the disk must spin faster while information at the center is being read, and slower when information on the outer circumference is read. Towards the center, the disk typically spins about 500 RPM, while on the outer edge, the disk spins 250 RPM.

Figure 1: The data spiral on a CD-ROM is nearly three miles long, encoding zeros and ones along the path as small flats and pits.

Data is stored in shallow depressions called pits, and in the space between pits, called land. Both of these are in a reflective layer. To read the information, a low power laser beam reflects from the spiral track back to the head. The small difference in the distance the laser beam travels when reflected from a pit or from land causes the reflected signal to differ in intensity due to interference effects. This difference in laser intensity indicates whether the laser was reflected from a pit or from land. The reflected beam is then converted to a radio signal by a photo detector on the head.

To convert the radio signal obtained into binary data, a 0 bit is considered to be constant land or constant pit, while a 1 bit is represented by a transition from pit to land.

Encoding

Encoding on a CD-ROM requires relatively complex strategies since an error may cause a serious failure in software. The CD-ROM encoding is governed by a standard called IEC 10149 (Yellow book).

Encoding on a CD-ROM requires the following:

error detection and correction
interleaving
change of physical form of data -> EFM coding
second layer of error detection and correction

The flats and pits on a CD-ROM do not directly correspond to the ones and zeros. This is a consequence of the way the laser detects ones and zeroes on the CD. Data is encoded, placed on a CD-ROM and then decoded on the machine to recover the original data. When original data is being placed on a CD-ROM, every 8 bits of data are converted to 14 bits. The conversion uses EFM or Eight-to-Fourteen Modulation. EFM is constructed so that each 1 bit is separated by a minimum of two 0's and a maximum of ten 0's. This provides an adequate modulation of signal intensity (sufficient density of pit/land edges encountered) so that the tracking mechanism works properly. A longer run of 0's would cause degradation of tracking.

Figure 2: The fundamental data structure on the CD/ROM

Error detection and correction codes are essential in producing a CD-ROM. There are thousands of such codes, most of which rely on additional bits (parity bits) to carry information to detect and correct data.

Interleaving is a process in which adjacent data on the disk is not adjacent data from the incoming file.

Bibliography

Kuhn, Kelin J. Audio and CD/ROM notes for EE 498

Patterson and Hennessy, Computer Organization and Design: The Hardware/Software Interface.

Pohlman, K. The Compact Disk Handbook. A-R editions, 1992

Press, Barry and Marcial. 2000. PC Upgrade and Repair Bible, Third Edition : IDG Books Worldwide, Inc.

Eight-to-Fourteen Modulation Conversion Table -- http://www.physics.udel.edu/wwwusers/watson/scen103/efm.html

Physical Level Representation of String CSE370 on a CD-ROM

Revised: 11/02/01.