Supplement to the hw6 main assignment.
Introduction
This is a very open ended assignment -- find the best cache design within budget for the benchmark suite we're using. To help bound it a bit, this page describes a writeup of a minimal but sufficient set of experiments to satisfy the assignment completely.You should be able to simulate all the alternatives listed here with a single piece of code, changing only some symbolic constants that define the cache parameters of interest (total capacity, associativity, and line size). If you write your code with that in mind it might save you a bit of time. (Looking at one or more other choices beyond those listed here is gravy, worth good karma, at least. They probably require you to write additional
C++
code, however.)The remainder of this page is a suggested paper organization. The wording, when given, is an example of the kind of content that might go in the section, and is not intended to be taken verbatim.
The Paper
Executive Summary
We examined the performance impact of altering x, y, and z on system performance, using the benchmark suite given by the standardbenchImage
image. We found the combination of x', y', and z' was best, improving processor performance by a factor of N.n relative to the baseline 32-word, direct-mapped, one word line, cache. Overall, x'' seemed to be the most important factor to performance, followed by y'' and z''.Baseline Performance
Execution ofbenchImage
with the baseline cache yielded the following statistics:In more detail, the read and write hit/miss rates were:
Total number of cycles xxxxxxx Insructions executed xxxxxxx Read Stall Cycles xxxxxxx Write Stall Cycles xxxxxxx CPI xxxxxxx
Read Write Hits Misses Hits Misses ICache xxxxxx% xxxxxx% --- --- DCache xxxxxx% xxxxxx% xxxxxx% xxxxx% Additionally, xxxxx cycles were spent on
flush
instructions. If these flushes were not required, CPI would be reduced by yyy%.Effect of Cache Capacity
To examine capacity, we started with the baseline, direct mapped caches and altered the number of lines. Figure 1 shows CPI as a function of capacity for theICache
, and Figure 2 for theDCache
. We find that...Based on this, we decided that a combination of x lines for the
ICache
and y lines for theDCache
was best, given our budget constraints.Effect of Associativity
Having fixed the capacities of the caches, we then varied associativity between x and y. Figures z and w show...We conclude that...
Effect of Line Size
We varied line size between x and y. Figures z and w show...We conclude that...
Factors Not Examined
The time budget for our study did not allow us to look at factors beyond capacity, associativity, and line size. However, we based on our observations of the performance of the cache, we note that xxx (e.g., stalls due to writes) seems to be the largest factor determining performance. For that reason, if further work were to be done we suggest looking at yyyy and zzzz, which address this issue. For the same reason, it does not appear fruitful to pursue wwww or .... because we do not believe they can significantly improve performance until xxx is addressed.