|L1 I-cache||L1 D-cache|
|L2 Unified Cache|
CPIreal = CPIbase + CPII-cache miss + CPII-cache miss
CPII-cache miss = miss rate x penalty = 0.01 x 10 = 0.1
CPII-cache miss = miss rate x penalty x load/store frequency = 0.05 x 10 x 0.20 = 0.1
CPIreal = 1 + 0.1 + 0.1 = 1.2
Notice although the cache miss penalties are high, the caches are doing a very good job of hiding latency because the CPI is hardly affected.
Suppose we have 64 byte blocks, we need six bits to index into the cache block. This is the displacement. We can further divide this into a block offset (four bits because our blocks are 16 words wide) and a byte offset (two bits because our words are four bytes wide).
If our cache size is 64 kbytes, then there are 1024 blocks in total. That means 10 bits are needed to select a block in the cache.
The remaining 16 bits of the address form the tag.
The cache capacity counts only the actual data, not the control information such as valid bits, tag bits and dirty bits which is stored with each cache block. You can think of this as asking what the maximum occupancy of a hotel is - you would not count the elevators or hallways as places for guests to stay.
Keeping the rest of the cache parameters the same, we now make the cache 4-way set associative. There are still 1024 blocks but only 256 sets since each set has four blocks. Thus we only need eight bits to select a set.
Note that the tag is now longer.
There are no index bits, just a wide tag.