Last name _Points_________ First name _Max______90_/_90
Section _A[AB]__
CSE378 Autumn 2003
Homework Problem Set #7

7.7 (12 points total)

ReferenceHit or miss
1Miss
4Miss
8Miss
5Miss
20Miss
17Miss
19Miss
56Miss
9Miss
11Miss
4Miss
43Miss
5Hit
6Miss
9Hit
17Hit
Block #Address
0 
117
2 
319
44
55
66
7 
856
99
10 
1143
12 
13 
14 
15 


7.8 (10 points total)

ReferenceHit or miss
1Miss
4Miss
8Miss
5Hit
20Miss
17Miss
19Hit
56Miss
9Miss
11Hit
4Miss
43Miss
5Hit
6Hit
9Miss
17Hit
Block #Address
016
14
28
3 


7.9 (4 points total)

There are 4K entries each of size 128 (data) + 16 (tag) + 1 (valid) = 145 bits. This is 593920 bits or 74240 bytes.


7.18 (4 points total)

n is the number of index bits.
2n x (63 - n) <= 18 x 32 x 1024 x 8 bits
Take log base 2 of both sides:
log2(2n x (63 - n)) <= 22.1699 (approximately)
n + log2(63 - n) <= 22.1699
n is an integer which means that (63 - n) must be greater than 1. A logarithm of a number greater than 1 is itself greater than 1, thus log2(63 - n) can only add to n. Therefore an upper bound on n is 22.
Keep trying values of n from 22 down until the inequality is satisfied. The largest such value of n is 16.

We can have 14 bits for the tag, 16 bits for the index and 2 bits for byte offset (there is only one word per block, so no block offset is required). The largest cache size is thus 216 x 4 = 256KB.

One possible answer suggested using three units of cache, with each unit having 215 indices. The problem with this is that if you try to access this cache, you need 17 bits for an "index" - two bits select which unit to use but there will be one value out of the four provided by two bits which is not used. The idea behind this question was to make one cache, with no special addressing requirements.


7.20 (13 points total)

ReferenceHit or miss
1Miss
4Miss
8Miss
5Miss
20Miss
17Miss
19Miss
56Miss
9Miss
11Miss
4Hit
43Miss
5Hit
6Miss
9Hit
17Hit
Block #Element #1 AddressElement #2 Address
0568
1179
2  
34311
4420
55 
66 
7  


7.21 (15 points total)

ReferenceHit or miss
1Miss
4Miss
8Miss
5Miss
20Miss
17Miss
19Miss
56Miss
9Miss
11Miss
4Hit
43Miss
5Hit
6Miss
9Hit
17Hit
Block #Address
01
14
28
35
420
517
619
756
89
911
1043
116
12 
13 
14 
15 

Be aware that the cache maintains a LRU ordering which is different to where the blocks are placed!


7.27 (7 points total)

Cache 1's miss penalty is 6 + 1 = 7 cycles
Average miss cycles per instruction = 4% x 7 + 8% x 7 / 2 = 0.56

Cache 2's miss penalty is 6 + 4 = 10 cycles
Average miss cycles per instruction = 2% x 10 + 5% x 10 / 2 = 0.45

Cache 3's miss penalty is 6 + 4 = 10 cycles
Average miss cycles per instruction = 2% x 10 + 4% x 10 / 2 = 0.40

The machine with cache 1 spends the most cycles on cache misses.


7.28 (8 points total)

Execution time = CPI x cycle time x instruction count
CPI not spent on cache misses = 2 - 0.56 = 1.44

Cache 1 execution time = (1.44 + 0.56) x 2 ns x IC = 4 x IC x 10-9
Cache 2 execution time = (1.44 + 0.45) x 2 ns x IC = 3.78 x IC x 10-9
Cache 3 execution time = (1.44 + 0.40) x 2.4 ns x IC = 4.416 x IC x 10-9

The machine with cache 2 is the fastest and the one with cache 3 is the slowest.


7.29 (7 points total)

Assume that int is 32 bits. In both the direct-mapped and two-way set associative caches there are 256 / 16 = 16 blocks in total. However they are obviously arranged differently.

For the direct-mapped cache, assume without loss of generality that array[0-3] maps to index 0 of the cache. That is, array[j] maps to the cache index given by the remainder when j / 4 is divided by 16. Thus array[0] and array[132] map to different cache indices. After the first two misses which cause these memory locations to be brought into the cache, there will be no further misses i.e. close to 0% miss rate. On the other hand array[0] and array[131] map to the same cache index and successive references will always miss (100% miss rate).

The two-way set associative cache maps array[j] to the cache index given by the remainder when j / 4 is divided by 8. The mappings for array[0], array[131] and array[132] are in fact unchanged but the two-way set associative cache can keep both array[0] and array[131] present at the same time. Thus the miss rate is 0% for both strides.


SMOK problem (10 points)

Note that if you add negative numbers to positive numbers (NOT required for this homework), the normalization step may require shifting left because the result of the addition is smaller. Only adding positive numbers only requires at most one shift to the right.