Memory, Data Encodings, and HLL Program Variables

Memory

Memory contains bits (binary digits).

We identify operands (data to be operated on by the processor) by giving memory addresses - memory is an array.

The unit of addressing is the byte - memory is "byte addressable". An address is the index of a byte.

As well as the address, we need to specify the number of bits in the operand:

A byte is 8 (consecutive) bits.
A halfword is 16 bits.
A word is 32 bits.

It is common for processors to require operand alignment - the address of an operand must be divisible by the operand's length. For example, word operands must be at addresses that are multiples of 4 (because a word is 4 bytes long).

Processors may be either big-endian or little-endian, which indicates the byte order - is the byte at the address the high-order or the low-order byte?

Data Encodings

A string of N bits has 2^N different possible values.

N	2^N	Slang
8	256	NA
10	1024	1K
20	~1,000,000	1M
30	~1,000,000,000	1G
32	~4,000,000,000	4G

We often use hexadecimal notation ("hex") to write down long bit strings. Each hex digit represents 4 bits. The hex digits are 0, 1, ..., 9, A, B, C, D, E, and F. Thus 0xFF is a string of eight 1's, 0x10 is 00010000, and 0xfedcba98 is 11111110110111001011101010011000.

A data encoding is a mapping from bit strings to values in the type of the encoding.

The processor "knows about" a few data encodings. Others are conventions used by the software running on the processor.

Processor Known Encodings

Bit Strings

The processor can copy them from one address to another. It can also perform logical (bit) operations on them, e.g., AND and XOR. (The result is the bit-wise result of the operation on bits in corresponding positions in the two operands.) It can also test for equality, and can shift them left or right.

Signed Integers

2's complement representation. For example, the signed byte corresponding to -2 is 11111110.

N bits can represent integers from -(2^(N-1)) to 2^(N-1)-1. (For example, 8 bits -> -128 to 127, 32 bits -> -2,147,483,648 to 2,147,483,647.)

Operations are arithmetic (add, subtract, etc.), comparison (less-than, equal, and greater-than), and sign-extending shifts. Overflow may occur on arithmetic instructions.

Unsigned Integers

N bits can represent the integers from 0 to 2^(N+1) - 1. (For example, 32 bits -> 0 to 4,294,967,295.)

Operations are aritmetic, comparison, and 0-extended shift. While the result of, say, adding one can cause the value to go from very large to zero, the processor does not indicate that overflow has occurred.

Addresses (Pointers)

Pointers are simply memory addresses, encoded as unsigned integers.

Floating Point

range

precision

Divide available bits into three fields:

Use	Single precision	Double precision
Sign	1	1
Exponent	8	11
Significand	23	52

Value is (-1)^S * 2^E * F.

Normalize the number so that the signifcand is 1.xxxxx, then don't store the bit corresponding to the leading 1. (Why?)

Bias the encoding of the exponent so that it's smallest value is represented by the bit string 00...0, and successively higher values are obtained by adding 1 to it as an unsigned integer. (Why?)

Encodings by Convention

Characters

ASCII	Hex	Symbol

48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63	30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F	0 1 2 3 4 5 6 7 8 9 : ; < = > ?

http://www.ascii.cl/

1. ASCII 32 (0x20) = Space
2. ASCII 48 (0x30) = '0'.  The decimal digits are consecutive codes.
3. ASCII 65 (0x41) = 'A'.  The characters are consecutive codes.
4. Lowercase codes are uppercase codes + 32 (0x20)

Unicode ("wide characters" to Microsoft) is a 16-bit encoding, allowing representations of a much larger set of characters. (See www.unicode.org.)

Character Strings

Other languages implement strings in other ways. For instance, Pascal strings are a 8-bit integer indicating the string length followed by consecutive 8-bit characters.

Arrays

Consecutive representations of the elements of the array. For example, (in C) int A[100] would 400 bytes long - 100 4-byte signed integers. If the address of A were 0x1000, A[0] would be in bytes 0x1000 - 0x1003 and A[4] would be in bytes 0x1010 - 0x1013.

Structures

struct {
    int   num;
    char  c;
    int   total;
}

Objects

Objects are structures...

Pixels

There are many pixel encodings. The simplest is 24-bit RGB, composed on three unsigned 8-bit integers representing the red, green, and blue intensities of the pixel. For instance, red would be 0xff0000 and white would be 0xffffff. 16-bit RGB uses 5/6/5 bit unsigned integers for the three color values. 8-bit representations often use a color table.

HLL Program Variables

Each variable in the program is allocated memory locations to hold its current value. The number of bytes allocated depends on the variable type (e.g., one byte for char and four for int or unsigned int).

HLL statements modifying variables are translated into machine instructions that modify the memory locations the variables occupy. For instance, if int x has been allocated bytes 0x1000-0x1003, the statement x++; will be translated into machine instructions that add one to the 32-bits in that memory location, interpreting them as a signed integer as it does so.

zahorjan@cs.washington.edu