CSE 351 - Spring 2010 - Section 5

Floating Point Conversion

Recall that the number line looks like the following. We only show the positive side, but the negative side is symmetric.

+Infinity
==================================
Biggest Positive Normalized Number	N_max
+Normalized Numbers
Smallest Positive Normalized Number	N_min
==================================
Biggest Positive Denormalized Number	D_max
+Denormalized Numbers	D
Smallest Positive Denormalized Number	D_min
==================================
Zero

The first step to converting floating point representations is to determine the dividing line between each kind of number above. That is, what are the values for N_max, N_min, D_max, D_min?

In the table above, D is the total number of denormalized values. In a sense, the denormalized values divide up the smallest normalized value (N_min) into D evenly spaced slices. We will use D later to calculate D_max and D_min.

For our desired floating point format, let's call the number of exponent bits k and fractional bits m. Then the binary representation of a floating point number looks like:

sign bit	exponent bits	fractional bits
s	k	m

To be concrete for the examples below, let's take s=1, k=3, m=5.

Normalized Values

Recall the following for normalized values:

Normalized Exponent values = 2**k - 2, excluding all 0's (denormalized value) and all 1's (infinity and NaN) 2**3 - 2 = 6 So encoded exponent values range from 1 to 6. Normalized Exponent Bias = 2**(k-1) - 1, subtracted from the encoded exponent to get the real exponent 2**(3-1) - 1 = 3 So actual exponent values (encoded - bias) range from -2 to 4. Fractional part has an implied 1.

N_max: The Biggest Normalized Number

To find N_max, we find the largest possible exponent value and the largest possible fractional value. With their forces combined, they form the largest possible normalized value.

Maximum, non-infinite exponent = all 1's except the last bit.
110 = 6
6 - (bias of 3) = 3

Maximum fractional part = all 1's
11111 = 31
Plus implied one = 1 and 31/32

N_max = (1 and 31/32) * (2**3) = 15.75

So if you are trying to convert a number greater than 15.75, then you will need to round it to positive infinity instead.

N_min: The Smallest Normalized Number

To find N_min, we find the smallest possible exponent value and the smallest possible fractional value (which is all 0's, plus the implied 1).

Minimum, normalized exponent = all 0's except 1 in the least significant bit
001 = 1
1 - (bias of 3) = -2

Minium fractional part = all 0's
00000 = 0
Plus implied one = 1

N_min = (1) * (2**-2) = 0.25

So if you are trying to convert a number that is between 0 and 0.25, you will need to make it a denormalized value (see next section!)

Denormalized Values

Denormalized values are all evenly-spaced. You can think of them as evenly dividing N_min, the smallest normalized number, into D = 2**m equal slices. How big are these slices?

Size of a denormalized slice: N_min / 2**m
= (0.25) / (2**5)
= (1/4) / 32
= 1 / 512

Remember that there is no implied 1 for denormalized values. If there were, we would be encoding the same value as some normalized value, which is inefficient and confusing.

So now this slice tells us the spacing between each denormalized number. It also tells us the value of D_max and D_min, which are separated from N_min and 0, respectively, by the size of this slice.

In our calculation below, we multiply N_min by the number 1, written as (32/32) to make subtracting fractions easier. This comes from D = 2**5 = 32.

D_max = N_min - (1/D) = (32/32)*(1/4) - (1/512) = (32/512) - (1/512) = 31/512

D_min = (1/D) = (1/512)

Procedure Call and Stack Frame

Recall the following simple program from several sections ago:

#include <stdio.h> int func(int a, int b) { return a+b; } int main(void) { int d = func(22,33); printf("%d\n", d); return 0; }

The func function disassembles to the following code.

0x00001faa	<func+0>:	push %ebp	Push the old ebp onto the stack
0x00001fab	<func+1>:	mov %esp,%ebp	Change the ebp to point to the current top of stack.
0x00001fad	<func+3>:	sub $0x8,%esp	Allocate some temporary scratch space (never used)
0x00001fb0	<func+6>:	mov 0xc(%ebp),%eax	Move the second argument into %eax
0x00001fb3	<func+9>:	add 0x8(%ebp),%eax	Add the first argument to the second argument, still in %eax
0x00001fb6	<func+12>:	leave	Restore %esp to %ebp, and %ebp to old value from stack
0x00001fb7	<func+13>:	ret	Pop return address from stack and return there.

Stack at func Beginning

When we first enter the function func, before we execute instruction <func_0>, the stack and ebp/esp look like below:

%ebp = 0xbffff338 %esp = 0xbffff30c

0xbffff30c:	0x00001fd9	Current $esp and return address to main
0xbffff310:	0x00000016	First argument, 0x16 = 22
0xbffff314:	0x00000021	Second argument, 0x21 = 33

Stack Right After <func_0>

Right after we execute instruction <func_0>, we've just pushed the current %ebp onto the stack, and things now look like this:

%ebp = 0xbffff338 %esp = 0xbffff308

0xbffff308:	0xbffff338	Old %ebp
0xbffff30c:	0x00001fd9	return address to main
0xbffff310:	0x00000016	First argument, 0x16 = 22
0xbffff314:	0x00000021	Second argument, 0x21 = 33

Stack Right After <func_1>

Right after we execute instruction <func_1>, we've moved the base pointer %ebp to point to the current stack top. But the stack contents haven't changed since the last instruction.

%ebp = 0xbffff308 %esp = 0xbffff308

Stack Right After <func_2>

Right after we execute instruction <func_2>, we've subtracted 8 bytes from the stack pointer %esp for some temporary scratch space. These are initialized to zero, and we never actually use them.

%ebp = 0xbffff308 %esp = 0xbffff300

0xbffff300:	0x00000000	Temporary scratch space
0xbffff304:	0x00000000	Temporary scratch space
0xbffff308:	0xbffff338	Old %ebp
0xbffff30c:	0x00001fd9	return address to main
0xbffff310:	0x00000016	First argument, 0x16 = 22
0xbffff314:	0x00000021	Second argument, 0x21 = 33

Interlude: <func_3> and <func_4>

The <func_3> instruction moves the second argument in %eax. The mov address references %ebp + 12, which in this case is 0xbffff308 + 0xc = 0xbffff314. From the stack in the previous section, we move the contents at that address (0x21) into %eax.

The <func_4> instruction adds the first argument to the contents of %eax (0x21 from above). The add address references %ebp + 8, which in this case is 0xbffff308 + 0x8 = 0xbffff310. From the stack in the previous section, we add the contents at that address (0x16) to the value in %eax and store the total again in %eax.

Neither of these instructions changes the stack, so we won't redisplay that here. Okay, now back to our normally scheduled, stack-changing programming.

Stack Right After <func_5>

Right after we execute instruction <func_3>, we've executed a leave instruction which returns the stack pointer %esp to %ebp, and all pops the old %ebp from the stack into %ebp. So now %ebp and %esp are the same as they were at the beginning of the function, before we executed <func_0>. Here's what it looks like:

%ebp = 0xbffff338 %esp = 0xbffff30c

0xbffff30c:	0x00001fd9	return address to main
0xbffff310:	0x00000016	First argument, 0x16 = 22
0xbffff314:	0x00000021	Second argument, 0x21 = 33

Stack Right After <func_6>, and ending the function call.

Right after we execute instruction <func_3>, we've popped the old return address off the stack and returned to the main function. As a result, the stack changes back to the way it was before we left the main function to call func in the first place.

%ebp = 0xbffff338 %esp = 0xbffff310

0xbffff310:	0x00000016	First argument, 0x16 = 22
0xbffff314:	0x00000021	Second argument, 0x21 = 33

Struct Packing

I spent a disproportionate amount of time on unions and struct packing in section. Rest assured, they are a small part of the course material, and nothing to stress about. However, here are some notes and a demo, just for completeness.

How does C allocate memory for members of a struct that have different sizes? For example, a char is 1 byte in size, but a double is 8 bytes. Let's write a program to find out.

#include <stdio.h> struct type_a { char a; int b; double c; char d; char e; }; union union_a { struct type_a a; char bytes[16]; }; int main(void) { int i; union union_a u; for (i = 0; i < sizeof(union union_a); i++) { u.bytes[i] = 0x00; printf("union_a[%u] = %hx\n", i, u.bytes[i]); } puts("\n"); u.a.a = 'z'; u.a.d = 'A'; u.a.e = 'B'; u.a.b = 123456789; u.a.c = 3.141592653956; for (i = 0; i < sizeof(union union_a); i++) { printf("union_a[%u] = %hx\n", i, u.bytes[i]); } puts("\n"); printf("sizeof(union union_a): %d\n", sizeof(union union_a)); return 0; }

From the output of this program, we can tell that C aligns ints to the nearest 4 byte boundary, but that it packs chars to fit within the same 4-byte word. All the space in between (padding) is just ignored and wasted space.

Here is how the struct is packed in memory, where each line represents a 4-byte word.

a (char)
b (int)
c (double, upper bits)
c (double, lower bits)
d (char)	e (char)

union_a[0] = 7a union_a[1] = 0 union_a[2] = 0 union_a[3] = 0 union_a[4] = 15 union_a[5] = ffcd union_a[6] = 5b union_a[7] = 7 union_a[8] = 49 union_a[9] = ffc2 union_a[10] = 50 union_a[11] = 54 union_a[12] = fffb union_a[13] = 21 union_a[14] = 9 union_a[15] = 40 union_a[16] = 41 union_a[17] = 42 union_a[18] = 0 union_a[19] = 0 sizeof(union union_a): 20