CSE 351: The Hardware/Software Interface, Winter 2011
  CSE Home   About Us   Search   Contact Info 
 
Course Home
  Home
Administation
  Overview
  Course email
  Anonymous feedback
  View feedback
 
Assignment Utilities
  Home Virtual Machines
  Homework Turnin
  Class GoPost Forum
  Gradebook
 
Most Everything
  Schedule
    Homework 5- Inferring Cache Geometries
Out: Tuesday February 15
Due: Tuesday February 22, 11:59PM
Turnin: Online

Overview

This is an assignment that lets you verify that you understand the basic mechanics of main memory caches. It involves some programming, but that task is straightforward once you decide what to do. The real value is in deciding what to do.

You can work on this assignment either individually or in a team of two.

What to Do

Your goal is to deduce the main memory cache geometry of a microprocessor, that is, the size of the cache, its associativity, and its block size. To do that, you could write a program that runs on that processor. Your program touches memory in a pattern you have specially constructed. For instance, you might allocate a very large chunk of memory, then walk through it reading every 33rd word (by treating it as an integer array and fetching every 33rd element). That's a silly example, of course, but the point is that you can control the order and location of reads.

It's possible to measure things like total cache misses on most real processors. How to do that depends on the language used, and the operating system you're running on. Additionally, real caches have features we haven't talked about, and those features make interpreting the miss results much harder than it might seem. For those reasons, we won't run on a real processor. Instead, we'll use a simulator.

The interface to the simulator is very simple. There are only two functions you need to use:

  • bool_t access_cache(addr_t address);
    You pass in an address (an addr_t, which is an unsigned 64-bit integer) and the function returns TRUE if a cache hit has occurred and FALSE if a miss has occurred. Either way, the block containing the address is inserted into the cache.
  • void flush_cache(void);
    This causes the cache to be emptied. (It takes no arguments.)

This means you don't have to allocate a big hunk of memory, because you can just hand the simulator addresses you want to (pretend to) touch. You also don't have to figure out how to access the cache hit/miss counters, because the simulator tells you 'hit' or 'miss' on every access.

What you need to do is write a program that uses these calls to infer the cache geometry, based on observations of hits and misses caused by the reference pattern you create (through the sequence of addresses passed into access_cache).

Running Your Code

We supply a number of customized compilations of the cache simulator. We refer to these as "mystery caches" because we're not telling you what geometries they implement. Each is distributed as a .o file that you must link against. (We provide a makefile to simply that task.) Each .o has had a specific cache geometry compiled into it - it knows what cache geometry to simulate without your having to supply any inputs to it. This scheme makes it possible for us to hide the geometry. For testing purposes, we reveal the geometries of a few instances of the simulator (also distributed as .o files), so that you can verify you're getting the right results for those.

Because we're distributing .o files, you must run on the system for which they were compiled. That system is Linux.

Writing and Building Your Code

We supply skeleton code, cache-test.c, that you complete. The skeleton knows how to initialize the cache simulator (using a previously unmentioned interface, cache_init()). cache-test.c provides a completed main() that has the essential control flow and prints out results (in a standard format we may rely on in grading, so don't change it). You complete three subroutines that the mainline depends on. The order in which the subroutines are invoked is a suggestion. You're free to change that order, though, if you find another order easier.

The caches whose geometries you will eventually discover are in files with names like cache0.o. To run using that file's simulator, you say:

make TEST_CACHE=cache0.o
That creates an executable (named cache-test), and then runs it. (If you want to invoke the program by hand, type ./cache-test after you've done the make.)

Files with names like cache_64c_2a_16b.o are test files. That one, for instance, is a 64 KB capacity, 2-way set-associative cache with 16B blocks. The test files are there to help you check your work.

The Operation of the Cache Simulator

You can assume that the mystery caches have sizes that are powers of 2. You cannot assume anything further about the cache parameters except what you can infer from the cache size. The mystery caches are all pretty realistic in their geometries, so use this fact to sanity check your results.

All caches use LRU replacement: when there is more than one choice of block to replace, the one that hasn't been referenced for the longest is chosen. A block that has never been referenced is considered older than any that has been (i.e., empty blocks are used before replacement can occur).

Obtaining the Code

All the files you need are in hw5.tar.gz. To extract the files from this archive, use the command:

tar xzf hw5.tar.gz

and the files will be extracted into subdirectory hw5.

Turnin

After you have completed the the 3 functions in cache-test.c (which have /* YOUR CODE GOES HERE */ comments in them), use them to determine the geometries of the three mystery caches.

In a comment at the top of your modified cache-test.c:

  • List your name and the name of your teammate, if you have one.
  • List the geometries of the three mystery caches.

Finally, submit that (one) file, cache-test.c.

Optional 5 Minutes Work

There is nothing to turn in for this, but you might find it interesting enough to be worth the few minutes it takes.

cachegrind is a tool that estimates the number of cache misses experienced by a running program. It is already installed on the CSE lab virtual machine (and presumably also on the CSE lab workstations, although I haven't tested that). We can use cachegrind to estimate the miss rates for the two loop nesting orders of the code used in homework 0.

Compile the code in the i-j loop order, to produce executable a.out. Now invoke:

valgrind --tool=cachegrind ./a.out
The output will provide information on miss rates. That execution also produces a file with name cachegrind.out.xxx, for some xxx. The command
cg_annotate cachegrind.out.xxx
will print the cache organization, as well as a more detailed breakdown of the misses.

Now switch the order of the i-j loops, and repeat.


Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to zahorjan at cs.washington.edu]