|
|
7.8 (10 points total)
|
|
7.9 (4 points total)
There are 4K entries each of size 128 (data) + 16 (tag) + 1 (valid) = 145 bits. This is 593920 bits or 74240 bytes.
7.18 (4 points total)
n is the number of index bits.
2n x (63 - n) <= 18 x 32 x 1024 x 8 bits
Take log base 2 of both sides:
log2(2n x (63 - n)) <= 22.1699 (approximately)
n + log2(63 - n) <= 22.1699
n is an integer which means that (63 - n) must be greater than 1. A logarithm
of a number greater than 1 is itself greater than 1, thus
log2(63 - n) can only add to n. Therefore an upper bound on n is
22.
Keep trying values of n from 22 down until the inequality is satisfied. The
largest such value of n is 16.
We can have 14 bits for the tag, 16 bits for the index and 2 bits for byte offset (there is only one word per block, so no block offset is required). The largest cache size is thus 216 x 4 = 256KB.
One possible answer suggested using three units of cache, with each unit having 215 indices. The problem with this is that if you try to access this cache, you need 17 bits for an "index" - two bits select which unit to use but there will be one value out of the four provided by two bits which is not used. The idea behind this question was to make one cache, with no special addressing requirements.
7.20 (13 points total)
|
|
7.21 (15 points total)
|
|
Be aware that the cache maintains a LRU ordering which is different to where the blocks are placed!
7.27 (7 points total)
Cache 1's miss penalty is 6 + 1 = 7 cycles
Average miss cycles per instruction = 4% x 7 + 8% x 7 / 2 = 0.56
Cache 2's miss penalty is 6 + 4 = 10 cycles
Average miss cycles per instruction = 2% x 10 + 5% x 10 / 2 = 0.45
Cache 3's miss penalty is 6 + 4 = 10 cycles
Average miss cycles per instruction = 2% x 10 + 4% x 10 / 2 = 0.40
The machine with cache 1 spends the most cycles on cache misses.
7.28 (8 points total)
Execution time = CPI x cycle time x instruction count
CPI not spent on cache misses = 2 - 0.56 = 1.44
Cache 1 execution time = (1.44 + 0.56) x 2 ns x IC = 4 x IC x
10-9
Cache 2 execution time = (1.44 + 0.45) x 2 ns x IC = 3.78 x IC x
10-9
Cache 3 execution time = (1.44 + 0.40) x 2.4 ns x IC = 4.416 x IC x
10-9
The machine with cache 2 is the fastest and the one with cache 3 is the slowest.
7.29 (7 points total)
Assume that int is 32 bits. In both the direct-mapped and two-way set associative caches there are 256 / 16 = 16 blocks in total. However they are obviously arranged differently.
For the direct-mapped cache, assume without loss of generality that array[0-3] maps to index 0 of the cache. That is, array[j] maps to the cache index given by the remainder when j / 4 is divided by 16. Thus array[0] and array[132] map to different cache indices. After the first two misses which cause these memory locations to be brought into the cache, there will be no further misses i.e. close to 0% miss rate. On the other hand array[0] and array[131] map to the same cache index and successive references will always miss (100% miss rate).
The two-way set associative cache maps array[j] to the cache index given by the remainder when j / 4 is divided by 8. The mappings for array[0], array[131] and array[132] are in fact unchanged but the two-way set associative cache can keep both array[0] and array[131] present at the same time. Thus the miss rate is 0% for both strides.
SMOK problem (10 points)
Note that if you add negative numbers to positive numbers (NOT required for this homework), the normalization step may require shifting left because the result of the addition is smaller. Only adding positive numbers only requires at most one shift to the right.