Extra Credit Lab: Caching

Assigned: 11/29/2006
Due: 12/8/2006

Description

This lab is an optional hardware lab which you may choose to complete in order to regain some of the points that you may have lost on the midterm exam. The requirements for this lab are going to be slightly different for other labs, particularly in regards to your choice of partners. For this lab, you may work with a partner, but you must have scored within 10 points of your partner's score on the midterm. The assignments for those working with partners and those working alone will also vary slightly.

Warning: This lab will be significantly less structured than previous labs. Start early.

Phase 0: Administration

For this lab, you will be provided with a board consisting of instruction and data caches, various I/O components, a memory system, and a place for your processor. You will take your processor from lab 4, place it on this board, and perform any modifications that you will require to make your processor compatible with the new system for accessing memory through the caches.

Download the eclab.zip file and use ActiveHDL's Restore Design function to get the design for this lab.
Download the eclab_cores.zip file and unpack this to the same folder as lib378cores.

Remember to transfer not just your processor from the previous lab but also any other files that your processor depends on, such as the branch, hazard, and forwarding units.

Phase 1: Designing the Data Cache

This lab introduces a new memory system which you will interface with the provided processor via an instruction cache (provided) and a data cache, which you will construct. The new memory system uses the block RAMs on the board, and can thus store significantly more data, but is going to be slightly more complicated to interface with because it does not provide asynchronous reads and may require more than one cycle to retrieve or store data. This is a benefit in disguise as it more closely resembles memory accesses under an actual system where it will take more than one cycle to access memory.

The new memory system utilizes a request system where your cache can request to read from or write to memory to interact with main memory. If you want to read data from memory, set a Read request. At some point later, the memory will respond with valid data. Similiarly, writing data back to memory is a matter of setting Write request.

Here's a cursory overview of the ports your Data Cache provides for interfacing with the memory system, and the uses for them:

From Memory/IO:

DCacheDataFromMem 128 bits Data returned from the memory system after a read request. NOTE: Only valid when DCacheValid is high

DCacheValid 1 bit This signal will go high once a Read/Write request is complete. You should read in the DataFromMem during this cycle

BypassDataFromMem 32 bits This is data returned from IO devices, make sure you aren't caching it.

To Memory:

DCacheAddrToMem 28 bits This is the address that you want to access in the memory.

DCacheDataToMem 128 bits This data that you want to write to the memory. It is only read in during the clock cycle when you have WriteRequestToMem asserted.

DCacheReadRequestToMem 1 bit Set this signal high for one clock cycle to signify that you want to perform a read from memory. Then wait for the Valid signal to go high.

DCacheWriteRequestToMem 1 bit Set this signal high for one clock cycle to signify that you want to perform a write to memory. Then wait for Valid to go high.

Requesting a Read/Write to the memory looks something like this:

Note that the signal names aren't quite the same. This was a write to the data cache that resulted in a cache miss. The Stall signal goes high immediately, and then ReadRequest is asserted for one clock cycle. A few cycles pass (the number CAN change), and then Valid is asserted. The Data from memory is written into the correct cache line during that cycle, which then means the cache line is valid and the processor can unstall.

Another key aspect of this lab is making the I/O devices cooperate with the cache. As you have seen with your programs in the previous lab, it is desired that any I/O access actually poll the device rather than rely on an existing stored value in order to make sure that our I/O accesses provide accurate feedback. In order to facilitate this, your cache must allow I/O read and write operations to bypass the standard caching system and be sent directly to the I/O devices on the board.

IMPORTANT NOTE: In order to make detecting a bypass easier, the VGA controller has been moved to have its character plane start at 0x8000C000 and its color plane start at 0x8000E000. Now, all I/O devices are located at addresses starting at 0x80000000 and above. This will necessitate some changes to board.h which will be mentioned later.

The third major consideration of this lab will be making your processor interface correctly with the data and instruction caches. Since memory accesses can now take more than one cycle to complete, you will need to incorporate a new stalling mechanism into your processor to handle stall signals coming from the instruction and data caches while they are reading/writing main memory. During these stalls, you should bring the entire processor to a halt. This is different from bubbling as you will NOT insert a nop into the pipeline at the location of the stall, rather, you will simply preserve the values in each pipeline register until the memory operation is completed.

Here is a list of things that you should keep in mind while designing your cache:

Take some time to figure out how many bits you will need in the cache for the various fields, especially the tag.
The primary difference between the data cache and the instruction cache is that the data cache will need to store data back to memory if the data in the cache has been modified. This requires that you keep track of which lines must be written back to memory.
The cache is best implemented as a state machine. Think about when you will be stalling the processor and the states that you will need to make sure that data is both read from and written to the memory.
You will have to consider all possible scenarios when reading and writing with the cache. For example, what will your cache do if the processor signals a write to a memory location that is not currently cached?
Your cache should only stall if there is actually a cache miss. If there is a hit in the cache, your processor should continue operating in the way that it has been operating up to this point, with no additional delays.

If you are working alone, proceed to Phase 1A for instructions on the cache that you will be constructing. If you are working with a partner, proceed to Phase 1B.

Phase 1A: Designing the Data Cache (Working Alone)

If you are working alone on this lab, you will be constructing a direct mapped cache with 4 word lines. Your cache should store at least 16 lines of data. This implementation is a relatively simple one, and your cache will simply read data into the cache and evict it based on the address. If a collision occurs, you will simply evict the existing line (while taking care of writing back if necessary) and replace it with the new data from memory.

To implement your cache, fill in the DataCache.v file provided with the board. Once you have completed this, continue on to Phase 1C.

Phase 1B: Designing the Data Cache (Group of 2)

If you are working with a partner on this lab, you will be constructing a 2-way set associative cache with 4-word lines. Your cache should contain at least 16 lines divided into two sets, in other words, you will have 2 sets of 8 lines each. This is slightly more complicated than writing a direct mapped cache as you will have to add logic to check both sets of the cache and will have to decide from which set to evict data given collision that requires that data be evicted from the cache.

To implement your cache, fill in the DataCache.v file provided with the board. Once you have completed this, continue on to Phase 1C.

Phase 1C: Modifying the processor

Now that you have data and instruction caches on the board, you must modify the processor to use operate with these caches. The key feature that you will need to implement in your processor design is the ability to stall when a memory operation is in progress. The key thing to keep in mind when you are implementing this feature is that a stall is NOT a bubble. You will not insert nops into the pipeline to effect a stall, rather, you will actually hold the state of each pipeline stage. There should not be any change in the status of the pipeline while this is in effect, and the pipeline resumes after the stall ends as though there had never been a stall.

Test Fixture

A test fixture that you can use to test your processor and data cache is available here. Add the files to your design, then add your processor to the test fixture board. Run the simulation for about 180us and see if the lights on the board have been blinking. If they have, your processor and cache should be functional. When synthesizing, do NOT include any files from this archive.

Phase 2: Implementing your design

Once you have modified your processor as necessary and have completed your caches, you will need to implement your design and put it on the board. The steps for this are going to be the same as with previous labs, so please refer to them for more detailed instructions.

First, synthesize your design. Use the top level design board.bde. Make sure that you have included lib378 in the libraries for synthesis and that you have the compile parameter SYNTHESIS set. Also, before starting, confirm that you are synthesizing for a Virtex2P vp30ff896.

Once synthesis has completed, implement your design using the Xilinx ISE as you have done in previous labs. Use the provided eclab.ucf file to specify the pin connections. When the bit file has been generated, you can put it on the board and transfer programs to it via the bootloader as before.

Phase 3: Programming for your new board

Implementing the data cache provides you with one key benefit when programming for your new processor design - you now have a lot more memory accessible to you. Since this design is capable of accessing the block RAMs on the board, there is a much larger range of memory available that you can use for your programs. However, some slight changes will have to be made to the board.h file that you are using as the memory system on this board assigns the VGA controller to a different range of addresses.

Download the new board.h and replace the existing one

Your programming assignment for this lab is to get the game that you have been working on to run on your processor. This game should be Pong at a minimum, and you are free to do anything more complicated or awesome that will run on the boards. There will be a contest for the best game at the end of the class and the winner will recieve some kind of prize.

Checkoff

When you are done, show your board running your program to Mark or a TA. If one of us is not around, email and schedule a time that we can check you off for this lab.


	Computer Science & Engineering University of Washington Box 352350 Seattle, WA 98195-2350 (206) 543-1695 voice, (206) 543-2969 FAX [comments to Course Staff]