Steam-powered Turing Machine University of Washington Computer Science & Engineering
 CSE 378 Fall 2006
  CSE Home     CSE 378 Fall 2006  About Us    Search    Contact Info 

 Home
Administrative
 Syllabus
 Office Hours
 Mailing List
Assignments
 Reading
 Homework
 Labs
 Exams
Resources
 Lecture Slides
 Handouts
 Wiki
 MIPS Resources
 AHDL Resources
Anonymous Feedback
 Feedback Form
   

Lab 5 - Writing a Cache Simulator

Assigned: 11/29/2006
Due: 12/8/2006 at 5 pm

Objective

The objective of this lab is to write a memory system simulator in order to explore the effects of using different caching schemes. We will be most concerned with observing the hit to miss ratio of various implementations for caches as well as the number of writebacks to memory that the cache performs. This lab will be done completely in software, and you will be using Java to program your various cache implementations.

Implementation

The memory system simulates the one that would be found on realistic processors. Specifically, it contains:
  • A Data Cache
  • An Instruction Cache
  • A TLB for Instruction addresses
  • A TLB for Data addresses
  • A PageTable object that handles paging

Cache Designs

For this lab, you will consider 3 different cache organizations, but only one implementation. Before talking about these organization options, here is a vocabulary review and a list of associated classes.

Components

Block

A single entry in a cache. At its most basic, has a valid bit, and a tag. In write-back caches (like ours) there is also a dirty bit used to signal that the block has been modified

Associated Classes

Line

A collection of blocks that are addressed by a single index. For our implementation, the line will be responsible for checking the tags of its blocks and determining which block to evict.

The replacement algorithm that will be used is the Least Recently Used (LRU) algorithm, which always evicts the least recently used entry associated with a particular line. In order to faciliate the use of the LRU algorithm, you must find some way to keep track of which block in a line has been accessed most recently.

Associated Classes

Organizations

Direct mapped cache

In a direct mapped cache, each location in memory maps to a single block in the cache. If a collision occurs, the existing entry in the cache is evicted and replaced with the block containing the requested location from memory. In order to get the mapping of addresses in memory to the cache, a hash function is used (a very simple one for purposes of this assignment) to determine the correct line in the cache.

Implementation: n lines, 1 Set/Line, Blocks of size k

Set associative cache

In a set associative cache, each address in memory still maps to a single Line in the cache, but each cache line now contains as many blocks as there are sets. For example, in a two way set associative cache, a memory address that maps to line 2 in the cache can be placed in block 0 of line 2 in the cache, but if that block is currently occupied, it can be placed block 1 of line 2. Such an approach helps us reduce conflict misses caused by accessing memory in strides.

Implementation: n lines, m Sets/Line, Blocks of size k

Fully associative cache

In a fully associative cache, each address in memory can map to any block in the cache. This is the optimal solution for conflict misses because no conflicts can occur until the entire cache if occupied. The problem is that every block must be checked which means a large number of comparators. For this reason, fully associative caches are seen rarely, but occasionally there are small fully-associative TLBs.

Implementation: 1 line, n Sets/Line, Blocks of size k

Translation Lookaside Buffer (TLB)

The TLB is used to speed up translations between virtual memory addresses and actual physical addresses in main memory. In a sense, it is another cache. Whenever you try access something in main memory, you will need to look up the physical address associated with the virtual address and use that physical address to perform the operation requested. If this address is present in the TLB, you just use that and write to disk, but if it is not, you will need to retrieve the address from disk, cache it in the TLB, and then perform the memory operation. This will obviously incur a time penalty and we want to minize this occurence as much as possible.

For this assignment, the TLB will function a little differently as we do not have an OS present to deal with page faults. If your TLB is properly implemented, you can call the provided translate function in the memory system and it will take care of getting the actual translation and provide you with the physical page that you require.

For this lab, you will implement a fully associative TLB with 32 entries. You will record the number of hits and misses involved with your TLB and their effect on the running time of the trace that you are provided. There will be one TLB for instructions and one for data.

Implementation Strategy

Download Lab5.zip here and extract it.

Getting it working

The code includes a number of methods that throw "Implement Me!" exceptions. We suggest that you use the following ordering.
  1. AbstractLine
  2. CacheLine
  3. TLBLine
  4. AbstractCache
  5. Cache
  6. TLB
  7. MemorySystem

Framework

Trace Generator

There is one trace included with the java files in lab5.zip. To make this instructional, there is a modified version of SPIM that will generate traces for arbitrary assembly code. Get it here.
Do not use any I/O from board.h
Generating a trace file is simply a matter of using one of the following commands:

$ make APP=[your file] trace

or

$ make [your file].trc

Simulator

Using the newly generated trace file is a matter of invoking the class cse378.MemorySystemTester with the trace file name as the first argument. This example assumes that the current directory is lab5:
$ java -classpath classes/ cse378.MemorySystemTester prob1.trc

Quantifying Cache Performance

In order to measure performance, we need some kind of metric to determine which cache is more efficient. For purposes of this assignment, we will use the following system:
  • A hit in the cache takes 1 cycle to resolve
  • A read from memory as a result of a miss takes 50 cycles to resolve
  • A write to memory as a result of an eviction takes 50 cycles to resolve
  • A hit in the TLB to look up an address takes 1 cycle to resolve
  • A miss in the TLB takes 100 cycles to resolve if the PageTable access is a Hit
  • A miss in the PageTable takes 2000 cycles to resolve
You must generate the following statistics from your simulation:
  • The total number of cycles required to run through a given trace for each type of cache
  • The number of cache hits, misses, and writebacks involved with each trace
  • The number of TLB hits, misses, and writebacks required for a given trace
  • The number of PageTable misses and writebacks required for each trace.
The Cache,TLB, and PageTable have fields hits, cleanMisses, and dirtyMisses that should be used to keep track of the hits and misses. Similarly, each of the components has a Status that should be updated according to the type of access. In MemorySystem the fetch, load, and store methods should return a LinkedList containing the various Status records. In MemorySystemTester these lists can be processed to compute the overall statistics for a given run.

Optimizing Programs for the Cache Designs

Now that you have 3 different organizations for your cache, you will try to write programs that are optimized towards each organization. For this section, your caches must contain 1024 words. Write three programs and run them through the cache configurations using the trace generator. Each of the programs should be geared towards being optimized for one particular organization, and should perform better on that particular organization than on the others. For example, if you are writing a program optimized for the direct mapped style of organization, it should be faster on a direct mapped cache than the two way set associative caches.

Turnin

When you have completed your cache simulator, zip up all source files and your programs and use "turnin -p lab5 -c cse378 'your file'" on attu to submit your project for grading.


CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to Course Staff]