CSE 378 Project 3

Introduction

In this assignment, you'll modify and use a cache simulator to experiment with various cache design parameters. We'll provide you with an almost-functioning cache simulator written in C++. The simulator models a simple write-through cache, with no write-buffer. Your job will be to integrate a write-buffer as well as implement a write-back policy. This assignment requires more thinking than coding. In particular, you shouldn't find yourself writing large volumes of code.

Cache Review

We'll be investigating write-through and write-back caches. Remember, a write-through policy always updates the cache as well as memory on a write-hit, while a write-back policy just updates the cache by marking the written word as dirty. Write-back caches only update memory when a block is replaced, and then only the words in the block that are "dirty." Write-through and write-back are two policies for deciding what to do during a write-hit. There are a variety of policies for dealing with write-misses, but the one we'll use is called write-around, which simply writes the data to memory, without modifying the cache.

An important optimization to caches is called a write-buffer. Write-buffers are used to hide the memory latency from the CPU. In particular, when we want to write memory (store) we just give the address and data to the write buffer, and continue processing. The write-buffer completes the write while letting us get on with other useful work. Write-buffers are typically small (less than 10 items) and can be very effective in hiding the memory latency of writes, as long as they do not occur in bursts. Write-buffers complicate both reads and writes, however. For example, during a read-miss, the write-buffer will often be busy communicating with memory, which means that the read must wait for the write-buffer to complete it's current write before the memory read can be initiated.

The Simulator

The simulator models a very simple cache architecture, taking as input a memory trace. A memory trace is simply a sequence of addresses which are referenced by a program. Our memory traces look like this:

<type>  <address> ... <type> <address> <eof>

<type> is a decimal numeral, where 0 means instruction fetch, 1 means data load, and 2 means data store. <address> is the hex address that is being accessed. Note that our caches do not have any data in them. Because we are just simulating the "performance" of caches (counting read/write hits/misses and cache stall cycles) we don't actually need to worry about the particular data we store in the cache.

The simulator reads type-address pairs from standard input, so you could imagine typing them in by hand, but that would be tedious. We've provided you with a set of real-world memory traces (from a running Lisp interpreter) of varying size for you to experiment with. In UNIX and you'd run the simulator as follows:

% cachesim < l10000

Where cachesim is the name of the simulator and l10000 is the name of the memory trace file. Note the use of the < input redirection operator, which makes the program take its input from a file rather from the keyboard. When the simulator encounters the end-of-file, it reports statistics such as cache misses, hits, and the number of cycles the CPU would have needed to stall waiting for memory.

Solution path

Print out the code and read it over. (Re)read the relevant sections of the text book.
Preferrably, you'll do your development on one of the departmental UNIX machines. If you do it elsewhere, we won't be able to provide you with first-class assistance. Also, developing on a UNIX box will give you some valuable experience with that environment.
The version of the code we provide models a write-through cache without a write-buffer. The first step is to get the cache working, which you'll do by filling in the details of the two functions TAG and INDEX, which, given an address, return the tag and the cache index of that address. You can calculate the tag and index by shifting and masking, or, by various combinations of divides and mods.
Experiment with the program a bit at this point. Try changing the block size (LINESIZE) and/or total cache size to see how this impacts the performance of the cache.
Next, you'll want to get the writebuffer integrated. Fortunately, we've provided you with the code which models the write-buffer, so all you need to do is add the appropriate calls to the right places. In particular, you'll want to look at the CacheBlock functions read and write.
On a read hit, we simply give the writebuffer one cycle. On a read miss, we need to ask the buffer to relinquish the bus (because it may be busy writing), making sure to count how many cycles we wait. Once the writebuffer has relinquished the bus, we can fetch the item from memory, again making sure to count how many cycles this takes.
On a write hit or miss, we can simply offload the item we wish to write to the write buffer. The buffer will possibly be full, so we may need to wait a few cycles until it has space available. Again, we want to remember how many cycles we needed to wait, if any. Now that the writebuffer is taking care of writing the item, we no longer need to pay the memory access penalty.
At this point, you'll want to gather some numbers. Please stay tuned for specific details. Before continuing, please make a backup of your current simulator, because the next changes will cause your code to diverge somewhat.
Next, modify the code to implement a write-back (rather than write-through) policy. First you'll want to add a representation of the dirty bits for each cache block; a small array works well. Then you'll want to modify the code under the read and write methods of CacheBlock. They should behave something like this:
- On a Read-hit: Give the write-buffer a cycle, count zero stall cycles, and return a hit.
- On a Read-miss: The line will need to be replaced. First, flush out the dirty items by writing them back to memory. Second, ask the write-buffer to give up the bus. Third, bring in the new block from memory. Note that each of these steps may incur some stall cycles, so be sure to count them all. Finally, set the line to valid, update the tag, clear the dirty bits, and return a miss.
- On a Write-hit: Just mark the word being written as dirty, give the write-buffer a cycle, and return a hit.
- On a Write-miss: This is the same as for write-through.
Finally, gather some more numbers to compare write-through against write-back. Again, stay tuned for details.

The Files

Please look here to find the relevant files for this homework (source code and memory traces).

Turnin

Specific details will be announced later. We're going to ask you to run some numbers and then draw conclusions about what the best cache design is. We'll ask you for a short writeup and some graphs. We may allow you to do the data-gathering, analysis, and writeup with a partner.

Extra Credit

Yes, there will be plenty of options; stay tuned for a list of juicy extra credit plums.

dugan@cs.washington.edu