CSE 467 Lab #4

University of Washington Department of Computer Science & Engineering

CSE 467 Lab #4

CSE Home

About Us

Contact Info

Files

lame.zip (250 kB)

float.c (small)

pportc.zip (250 kB)

PPort.zip (2.3 MB)

floatMult.c (small))

This lab specification is subject to change. The problem presented, and the high level requirements will not change.
This lab is due in class on Monday, February 19
Purpose
You have played around with data processing, and it is now time to actually use the hardware in a program. We will be using LAME, the MP3 encoder, and will implement a particular loop involving floating point multiplications. You will implement part of this loop in Hardware and test it on the larger FPGA systems.
The Problem
This is the piece of code that you will be working with. It is found in the file newmdct.c at the end of the function window_subband . This is a doubly nested loop that could be completely implemented in HW, We have not yet decided how much of this routine will be in HW and in SW. Probably just the inner multiplications, but maybe the inner sums too.
... wp = &mm[0][0]; for (i = 15; i >= 0; --i) { int j; FLOAT8 s0 = s; /* mm[i][0] is always 1 */ FLOAT8 s1 = t * *wp++; for (j = 14; j >= 0; j--) { s0 += *wp++ * *in++; s1 += *wp++ * *in++; } in -= 30; d[i ] = s0 + s1; d[31 - i] = s0 - s1; } ...
There are several factors to consider before you implement this in HW. In particular, FLOAT8s are 64 bit double precision numbers. You will most likely not need this degree of precision.
Step One (Week One)

Determine the bit width and numerical representation that you need to obtain good "telephone" quality compression from LAME. The 64bit floating point implementation in LAME produces CD quality compression. Here are some suggestions for doing this (All in SW):

Modify the LAME project to compile statistics on the floating point numbers that are used in a default execution (compress your favorite .wav file). In particular, compute histograms depicting the distribution of values. The histogram should help you understand how many bits you need and when its safe to round off.

Modify this routine to use only signed 16-bit integers (int) for the operands and the intermediate results. When you convert, and when you do the math, make sure not to overflow. Using "saturating" math. Convert your results back FLOAT8 and listen to the audio quality. The easiest way to do this is to write mult(), add(), and convert() routines that do the saturating arithmetic.

If the 16 bit representation isn't satisfactory invent your own floating point representation and mult(), add(), convert() routines, trying various bit widths for the mantissa and exponent (see notes below) until you achieve reasonable quality when you run Lame.

Write a Verilog modules for just the pipelined floating point multiplier (not the loop controller). Use Shift-and-Add for mantissa multiplication. This does not have to be tested for the first week but I recommend doing so in Verilogger Pro or Xilinx simulator. It should look a lot like your C implementation, except for the pipelining.

Step Two (Week Two)
Once you are comfortable with your design in SW, you should try implementing it in hardware. We will provide you with a framework for downloading data to the card using the new larger FPGAs. You can modify the size and width of memory, as well as how the data is packed in there. Your processor does not have to be one Verilog module. We haven't decided yet how much of the routine to do in hardware but it might be the entire INNER LOOP. We might give you a pipelined floating point adder to use for that.

Test your HW by modifying the window_subband.c routine so that it packs a bunch of your custom floats into an array, and sends them off to the hardware. Also, have your software receive the floats from the hardware and integrate the results into the code (to make things work). Your modified routine should compare the HW results to the software results from Step One. They should be exactly the same.
As a final bit of happiness, compress your favorite .wav file to an MP3 and play it, basking in the successful fusion of hardware and software.
Big Note!
This lab is being conducted on a different piece of hardware from all previous labs. There are only three (3) of these boards, so people will need to share. Almost all of your work (design, verification) can be done without the board. Only the final fusion of software and hardware requres the board. Do not monopolize the hardware! If you do, I will be called on to beat upon you. We have not set up a particularly clever time sharing system, but I assume we will have people rotate in and out of machines
Turn-In Requirements
This lab is again a two week lab. We will have a checkpoint due on Monday the 12th of February. For this checkpoint, you will need to turn in

Design and implementation work surrounding your software floating point representation. This includes a discussion of why you chose particular parameters for your representation (based on your histograms, presumably), as well as the modifications you made to the LAME project. This should include a printout of your histogram (excel) and your math routines.
A preliminary verilog modules that performs the piplined floating point multiply for your specification. You do not need to test this module, but you must have something done. By the time you come into your second lab, you should be (at least) starting the verification and debugging phase of the project.
The final turn in will require the following additional material:

Design work and documentation for your HW implementation
Print out of the relevant code regions from newmdct.c
Discussion of the trade-offs involved in this implementation project. In particular

How did you determine what parameters to use for expSize and mantSize?
How does the complexity of your implementation vary based on expSize and mantSize? Would it be easier for you to increase expSize or mantSize in terms of CLBs and in terms of processing time? (Make a theoretical argument. You do not need to implement your design multiple times).
How could intimacy with the code (the meaning of the values being multiplied, for example) have helped you in your implementation?

Floating Point Tutorial
Floating point numbers are so called because they can vary their exponent (allowing the decimal point to "float"). The general layout (and the one we will use) is

Sign Exponent Mantissa

1 bit |Exponent| bits |Mantissa| bits

The number that you are representing is
Number = (-1)^Sign * 2^(Exponent - |MaxExponent|/2) * 1.Mantissa
Note that the mantissa is in binary. 1.011 = 1 + 1/4 + 1/8.
Here is some code that should print the floating point representation given the size of the exponent and mantissa. Poking around in this code should help you understand the general floating point representation. You can also use this code to convert to and from floating point representations. Note that you still have to implement multiplication ( do not do this by converting to doubles and multiplying!).
When multiplying floating point numbers: Note that the product of two floating point numbers is
(-1)^Sign1 * (-1)^Sign2 * 2^(Exponent1 - |MaxExponent|/2) * 2^(Exponent2 - 2^(|MaxExponent|/2)) * 1.Mantissa1 * 1.Mantissa2
which is equal to
(-1)^(Sign1 + Sign2) * 2^(Exponent1 + Exponent2 - |MaxExponent|/2) * (1.Mantissa1 * 1.Mantissa2)
which is a floating point number with

Sign Exponent Mantissa

Sign1 + Sign2 Exponent1 + Exponent2 - 2^(|Exponent|) Mantissa * Mantissa

Unfotunately, it is not this simple. You must take these results and ensure that they are encoded properly. The matissa must be normalised to be (1.something), and the exponent must be converted back to its offset form. You may need to keep more information about the intermediate result around than you finally record!

Department of Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA 98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to mcsherry]