CSE 351 - Spring 2010 - Section 5

Floating Point Conversion

Recall that the number line looks like the following. We only show the positive side, but the negative side is symmetric.

+Infinity
==================================
Biggest Positive Normalized NumberNmax

+Normalized Numbers
Smallest Positive Normalized NumberNmin
==================================
Biggest Positive Denormalized NumberDmax

+Denormalized Numbers
D
Smallest Positive Denormalized NumberDmin
==================================
Zero

The first step to converting floating point representations is to determine the dividing line between each kind of number above. That is, what are the values for Nmax, Nmin, Dmax, Dmin?

In the table above, D is the total number of denormalized values. In a sense, the denormalized values divide up the smallest normalized value (Nmin) into D evenly spaced slices. We will use D later to calculate Dmax and Dmin.

For our desired floating point format, let's call the number of exponent bits k and fractional bits m. Then the binary representation of a floating point number looks like:

sign bitexponent bitsfractional bits
skm

To be concrete for the examples below, let's take s=1, k=3, m=5.

Normalized Values

Recall the following for normalized values:

Normalized Exponent values = 2**k - 2, excluding all 0's (denormalized value) and all 1's (infinity and NaN)
2**3 - 2 = 6
So encoded exponent values range from 1 to 6.
Normalized Exponent Bias = 2**(k-1) - 1, subtracted from the encoded exponent to get the real exponent
2**(3-1) - 1 = 3
So actual exponent values (encoded - bias) range from -2 to 4.
Fractional part has an implied 1.

Nmax: The Biggest Normalized Number

To find Nmax, we find the largest possible exponent value and the largest possible fractional value. With their forces combined, they form the largest possible normalized value.

Maximum, non-infinite exponent = all 1's except the last bit.
110 = 6
6 - (bias of 3) = 3

Maximum fractional part = all 1's
11111 = 31
Plus implied one = 1 and 31/32

Nmax = (1 and 31/32) * (2**3) = 15.75

So if you are trying to convert a number greater than 15.75, then you will need to round it to positive infinity instead.

Nmin: The Smallest Normalized Number

To find Nmin, we find the smallest possible exponent value and the smallest possible fractional value (which is all 0's, plus the implied 1).

Minimum, normalized exponent = all 0's except 1 in the least significant bit
001 = 1
1 - (bias of 3) = -2

Minium fractional part = all 0's
00000 = 0
Plus implied one = 1

Nmin = (1) * (2**-2) = 0.25

So if you are trying to convert a number that is between 0 and 0.25, you will need to make it a denormalized value (see next section!)

Denormalized Values

Denormalized values are all evenly-spaced. You can think of them as evenly dividing Nmin, the smallest normalized number, into D = 2**m equal slices. How big are these slices?

Size of a denormalized slice: Nmin / 2**m
= (0.25) / (2**5)
= (1/4) / 32
= 1 / 512

Remember that there is no implied 1 for denormalized values. If there were, we would be encoding the same value as some normalized value, which is inefficient and confusing.

So now this slice tells us the spacing between each denormalized number. It also tells us the value of Dmax and Dmin, which are separated from Nmin and 0, respectively, by the size of this slice.

In our calculation below, we multiply Nmin by the number 1, written as (32/32) to make subtracting fractions easier. This comes from D = 2**5 = 32.
Dmax = Nmin - (1/D) = (32/32)*(1/4) - (1/512) = (32/512) - (1/512) = 31/512
Dmin = (1/D) = (1/512)

Procedure Call and Stack Frame

Recall the following simple program from several sections ago:

#include <stdio.h>

int func(int a, int b) {
return a+b;
}

int main(void) {
int d = func(22,33);
printf("%d\n", d);
return 0;
}

The func function disassembles to the following code.

0x00001faa <func+0>: push %ebp Push the old ebp onto the stack
0x00001fab <func+1>: mov %esp,%ebpChange the ebp to point to the current top of stack.
0x00001fad <func+3>: sub $0x8,%espAllocate some temporary scratch space (never used)
0x00001fb0 <func+6>: mov 0xc(%ebp),%eaxMove the second argument into %eax
0x00001fb3 <func+9>: add 0x8(%ebp),%eaxAdd the first argument to the second argument, still in %eax
0x00001fb6 <func+12>: leave Restore %esp to %ebp, and %ebp to old value from stack
0x00001fb7 <func+13>: ret Pop return address from stack and return there.

Stack at func Beginning

When we first enter the function func, before we execute instruction <func_0>, the stack and ebp/esp look like below:

%ebp = 0xbffff338
%esp = 0xbffff30c
0xbffff30c:0x00001fd9Current $esp and return address to main
0xbffff310:0x00000016First argument, 0x16 = 22
0xbffff314:0x00000021Second argument, 0x21 = 33

Stack Right After <func_0>

Right after we execute instruction <func_0>, we've just pushed the current %ebp onto the stack, and things now look like this:

%ebp = 0xbffff338
%esp = 0xbffff308
0xbffff308:0xbffff338Old %ebp
0xbffff30c:0x00001fd9return address to main
0xbffff310:0x00000016First argument, 0x16 = 22
0xbffff314:0x00000021Second argument, 0x21 = 33

Stack Right After <func_1>

Right after we execute instruction <func_1>, we've moved the base pointer %ebp to point to the current stack top. But the stack contents haven't changed since the last instruction.

%ebp = 0xbffff308
%esp = 0xbffff308

Stack Right After <func_2>

Right after we execute instruction <func_2>, we've subtracted 8 bytes from the stack pointer %esp for some temporary scratch space. These are initialized to zero, and we never actually use them.

%ebp = 0xbffff308
%esp = 0xbffff300
0xbffff300:0x00000000Temporary scratch space
0xbffff304:0x00000000Temporary scratch space
0xbffff308:0xbffff338Old %ebp
0xbffff30c:0x00001fd9return address to main
0xbffff310:0x00000016First argument, 0x16 = 22
0xbffff314:0x00000021Second argument, 0x21 = 33

Interlude: <func_3> and <func_4>

The <func_3> instruction moves the second argument in %eax. The mov address references %ebp + 12, which in this case is 0xbffff308 + 0xc = 0xbffff314. From the stack in the previous section, we move the contents at that address (0x21) into %eax.

The <func_4> instruction adds the first argument to the contents of %eax (0x21 from above). The add address references %ebp + 8, which in this case is 0xbffff308 + 0x8 = 0xbffff310. From the stack in the previous section, we add the contents at that address (0x16) to the value in %eax and store the total again in %eax.

Neither of these instructions changes the stack, so we won't redisplay that here. Okay, now back to our normally scheduled, stack-changing programming.

Stack Right After <func_5>

Right after we execute instruction <func_3>, we've executed a leave instruction which returns the stack pointer %esp to %ebp, and all pops the old %ebp from the stack into %ebp. So now %ebp and %esp are the same as they were at the beginning of the function, before we executed <func_0>. Here's what it looks like:

%ebp = 0xbffff338
%esp = 0xbffff30c
0xbffff30c:0x00001fd9return address to main
0xbffff310:0x00000016First argument, 0x16 = 22
0xbffff314:0x00000021Second argument, 0x21 = 33

Stack Right After <func_6>, and ending the function call.

Right after we execute instruction <func_3>, we've popped the old return address off the stack and returned to the main function. As a result, the stack changes back to the way it was before we left the main function to call func in the first place.

%ebp = 0xbffff338
%esp = 0xbffff310
0xbffff310:0x00000016First argument, 0x16 = 22
0xbffff314:0x00000021Second argument, 0x21 = 33

Struct Packing

I spent a disproportionate amount of time on unions and struct packing in section. Rest assured, they are a small part of the course material, and nothing to stress about. However, here are some notes and a demo, just for completeness.

How does C allocate memory for members of a struct that have different sizes? For example, a char is 1 byte in size, but a double is 8 bytes. Let's write a program to find out.

#include <stdio.h>

struct type_a {
char a;
int b;
double c;
char d;
char e;
};

union union_a {
struct type_a a;
char bytes[16];
};

int main(void) {
int i;
union union_a u;
for (i = 0; i < sizeof(union union_a); i++) {
u.bytes[i] = 0x00;
printf("union_a[%u] = %hx\n", i, u.bytes[i]);
}
puts("\n");
u.a.a = 'z';
u.a.d = 'A';
u.a.e = 'B';
u.a.b = 123456789;
u.a.c = 3.141592653956;

for (i = 0; i < sizeof(union union_a); i++) {
printf("union_a[%u] = %hx\n", i, u.bytes[i]);
}
puts("\n");

printf("sizeof(union union_a): %d\n", sizeof(union union_a));
return 0;
}

From the output of this program, we can tell that C aligns ints to the nearest 4 byte boundary, but that it packs chars to fit within the same 4-byte word. All the space in between (padding) is just ignored and wasted space.

Here is how the struct is packed in memory, where each line represents a 4-byte word.

a (char)             
b (int)
c (double, upper bits)
c (double, lower bits)
d (char)e (char)  
union_a[0] = 7a
union_a[1] = 0
union_a[2] = 0
union_a[3] = 0
union_a[4] = 15
union_a[5] = ffcd
union_a[6] = 5b
union_a[7] = 7
union_a[8] = 49
union_a[9] = ffc2
union_a[10] = 50
union_a[11] = 54
union_a[12] = fffb
union_a[13] = 21
union_a[14] = 9
union_a[15] = 40
union_a[16] = 41
union_a[17] = 42
union_a[18] = 0
union_a[19] = 0

sizeof(union union_a): 20