Concerning Caches

Caches in the MIPS Pipeline

Memory is accessed through the caches in two places in the MIPS pipeline, at the instruction fetch and the memory stages.

Instruction
Fetch (IF) ___ Instruction
Decode (ID) ___ Execute
(EX) ___ Memory
(MEM) ___ Write Back
(WB)

| |

L1 I-cache L1 D-cache

| |

|

L2 Unified Cache

Cache CPI Contributions

Ignore the L2 cache for this problem. Suppose our D-cache miss rate is 0.05 and I-cache miss rate is 0.01. The cache miss penalty is 10 cycles. 20% of our instructions are loads or stores. CPI_base of the pipelined machine is 1 (it makes the math easy, it is not realistic).

CPI_real = CPI_base + CPI_{I-cache miss} + CPI_{I-cache miss}

CPI_{I-cache miss} = miss rate x penalty = 0.01 x 10 = 0.1

CPI_{I-cache miss} = miss rate x penalty x load/store frequency = 0.05 x 10 x 0.20 = 0.1

CPI_real = 1 + 0.1 + 0.1 = 1.2

Notice although the cache miss penalties are high, the caches are doing a very good job of hiding latency because the CPI is hardly affected.

Addressing Caches

Direct

31		16	15		6	5		0
tag			index			displacement

Suppose we have 64 byte blocks, we need six bits to index into the cache block. This is the displacement. We can further divide this into a block offset (four bits because our blocks are 16 words wide) and a byte offset (two bits because our words are four bytes wide).

If our cache size is 64 kbytes, then there are 1024 blocks in total. That means 10 bits are needed to select a block in the cache.

The remaining 16 bits of the address form the tag.

The cache capacity counts only the actual data, not the control information such as valid bits, tag bits and dirty bits which is stored with each cache block. You can think of this as asking what the maximum occupancy of a hotel is - you would not count the elevators or hallways as places for guests to stay.

Set-Associative

31		14	13		6	5		0
tag			set index			displacement

Keeping the rest of the cache parameters the same, we now make the cache 4-way set associative. There are still 1024 blocks but only 256 sets since each set has four blocks. Thus we only need eight bits to select a set.

Note that the tag is now longer.

Fully Associative

31		6	5		0
tag			displacement

There are no index bits, just a wide tag.

CSE 378 Spring 2002 - Section 9

First	Previous	Page 1	Next	Last

Instruction Fetch (IF)	___	Instruction Decode (ID)	___	Execute (EX)	___	Memory (MEM)	___	Write Back (WB)
\|						\|
L1 I-cache						L1 D-cache
\|						\|

\|
L2 Unified Cache