HW6 - Task 6: Writeup

Supplement to the hw6 main assignment.

Introduction

This is a very open ended assignment -- find the best cache design within budget for the benchmark suite we're using. To help bound it a bit, this page describes a writeup of a minimal but sufficient set of experiments to satisfy the assignment completely.

You should be able to simulate all the alternatives listed here with a single piece of code, changing only some symbolic constants that define the cache parameters of interest (total capacity, associativity, and line size). If you write your code with that in mind it might save you a bit of time. (Looking at one or more other choices beyond those listed here is gravy, worth good karma, at least. They probably require you to write additional C++ code, however.)

The remainder of this page is a suggested paper organization. The wording, when given, is an example of the kind of content that might go in the section, and is not intended to be taken verbatim.

The Paper

Executive Summary

We examined the performance impact of altering x, y, and z on system performance, using the benchmark suite given by the standard benchImage image. We found the combination of x', y', and z' was best, improving processor performance by a factor of N.n relative to the baseline 32-word, direct-mapped, one word line, cache. Overall, x'' seemed to be the most important factor to performance, followed by y'' and z''.

Baseline Performance

Execution of benchImage with the baseline cache yielded the following statistics:
Total number of cycles xxxxxxx
Insructions executed xxxxxxx
Read Stall Cycles xxxxxxx
Write Stall Cycles xxxxxxx
CPI xxxxxxx
In more detail, the read and write hit/miss rates were:
  Read Write
Hits Misses Hits Misses
ICache xxxxxx% xxxxxx% --- ---
DCache xxxxxx% xxxxxx% xxxxxx% xxxxx%

Additionally, xxxxx cycles were spent on flush instructions. If these flushes were not required, CPI would be reduced by yyy%.

Effect of Cache Capacity

To examine capacity, we started with the baseline, direct mapped caches and altered the number of lines. Figure 1 shows CPI as a function of capacity for the ICache, and Figure 2 for the DCache. We find that...

Based on this, we decided that a combination of x lines for the ICache and y lines for the DCache was best, given our budget constraints.

Effect of Associativity

Having fixed the capacities of the caches, we then varied associativity between x and y. Figures z and w show...

We conclude that...

Effect of Line Size

We varied line size between x and y. Figures z and w show...

We conclude that...

Factors Not Examined

The time budget for our study did not allow us to look at factors beyond capacity, associativity, and line size. However, we based on our observations of the performance of the cache, we note that xxx (e.g., stalls due to writes) seems to be the largest factor determining performance. For that reason, if further work were to be done we suggest looking at yyyy and zzzz, which address this issue. For the same reason, it does not appear fruitful to pursue wwww or .... because we do not believe they can significantly improve performance until xxx is addressed.