|
|
|
|
Homework 5- Inferring Cache Geometries
Out: Tuesday February 15
Due: Tuesday February 22, 11:59PM
Turnin: Online
Overview
This is an assignment that lets you verify that you understand the basic
mechanics of main memory caches. It involves some programming, but that task
is straightforward once you decide what to do. The real value is in deciding
what to do.
You can work on this assignment either individually or in a team of two.
What to Do
Your goal is to deduce the main memory cache geometry of a microprocessor, that
is, the size of the cache, its associativity, and its block size.
To do that, you could write a program that runs on that processor. Your
program touches memory in a pattern you have specially constructed.
For instance, you might
allocate a very large chunk of memory, then walk through it reading every
33rd word (by treating it as an integer array and fetching every 33rd element).
That's a silly example, of course, but the point is that you can control
the order and location of reads.
It's possible to measure things like total cache misses on most real
processors. How to do that depends on the language used, and the operating
system you're running on. Additionally, real caches have features we haven't
talked about, and those features make interpreting the miss results much harder than it
might seem. For those reasons, we won't run on a real processor. Instead,
we'll use a simulator.
The interface to the simulator is very simple. There
are only two functions you need to use:
bool_t access_cache(addr_t address);
You pass in an address (an addr_t , which is an unsigned 64-bit integer) and
the function returns TRUE if a cache hit has occurred and FALSE if a miss has
occurred. Either way, the block containing the address is inserted into the
cache.
void flush_cache(void);
This causes the cache to be emptied. (It takes no arguments.)
This means you don't have to allocate a big hunk of memory, because you can
just hand the simulator addresses you want to (pretend to) touch.
You also don't have to figure out how to access the cache hit/miss counters,
because the simulator tells you 'hit' or 'miss' on every access.
What you need to do is write a program that uses these calls to infer
the cache geometry, based on observations of hits and misses caused by the
reference pattern you create (through the sequence of addresses passed into
access_cache ).
Running Your Code
We supply a number of customized compilations of the cache simulator.
We refer to these as "mystery caches" because we're not telling you what
geometries they implement.
Each is distributed as a .o file that you must link against. (We provide a makefile
to simply that task.) Each .o has had a specific cache geometry compiled into
it - it knows what cache geometry to simulate without your having to supply
any inputs to it. This scheme makes it possible for us to hide the geometry.
For testing purposes, we reveal the geometries of a few
instances of the simulator (also distributed as .o files),
so that you can verify you're getting the right results for those.
Because we're distributing .o files, you must run on the system for which
they were compiled. That system is Linux.
Writing and Building Your Code
We supply skeleton code, cache-test.c, that you complete.
The skeleton knows how to initialize the cache simulator (using a previously
unmentioned interface, cache_init() ).
cache-test.c provides a completed main() that has the essential
control flow and prints out results (in a standard format we may rely on in
grading, so don't change it).
You complete three subroutines that the mainline depends on.
The order in which the subroutines are invoked is a suggestion.
You're free to change that order, though, if you find another order easier.
The caches whose geometries you will eventually discover are in files
with names like cache0.o. To run using that file's simulator,
you say:
make TEST_CACHE=cache0.o
That creates an executable (named cache-test ), and then
runs it. (If you want to invoke the program by hand, type
./cache-test after you've done the make .)
Files with names like cache_64c_2a_16b.o are test files.
That one, for instance, is a 64 KB capacity, 2-way
set-associative cache with 16B blocks. The test files
are there to help you check your work.
The Operation of the Cache Simulator
You can assume that the mystery caches have sizes that are powers of 2.
You cannot assume anything further about the cache parameters except what
you can infer from the cache size.
The mystery caches are all pretty realistic in
their geometries, so use this fact to sanity check your results.
All caches use LRU replacement: when there is more than one choice of
block to replace, the one that hasn't been referenced for the longest
is chosen.
A block that has never been referenced is considered older than any that has
been (i.e., empty blocks are used before replacement can occur).
Obtaining the Code
All the files you
need are in hw5.tar.gz. To extract the files from this
archive, use the command:
tar xzf hw5.tar.gz
and the files will be extracted into subdirectory
hw5 .
Turnin
After you have completed the the 3 functions in cache-test.c
(which have
/* YOUR CODE GOES HERE */ comments in them),
use them to determine the geometries of the three mystery caches.
In a comment at the top of your modified cache-test.c :
- List your name and the name of your teammate, if you have one.
- List the geometries of the three mystery caches.
Finally, submit that (one) file, cache-test.c .
Optional 5 Minutes Work
There is nothing to turn in for this, but you might find it interesting enough to be worth
the few minutes it takes.
cachegrind
is a tool that estimates the number of cache misses experienced by a running program. It is already
installed on the CSE lab virtual machine (and presumably also on the CSE lab workstations, although I
haven't tested that).
We can use cachegrind to estimate the miss rates for the two loop nesting orders of the code
used in homework 0.
Compile the code in the i-j loop order, to produce executable a.out. Now invoke:
valgrind --tool=cachegrind ./a.out
The output will provide information on miss rates. That execution also produces a file with
name cachegrind.out.xxx, for some xxx. The command
cg_annotate cachegrind.out.xxx
will print the cache organization, as well as a more detailed breakdown of the misses.
Now switch the order of the i-j loops, and repeat.
|