

## Goal of Memory Hierarchy

- Keep close to the CPU only information that will be needed now and in the near future. Can't keep everything close because of cost.
- Technology:

| Typcial Size  | Access time                             | Relative speed<br>(compared to<br>reg access)                              | Cost                                                                                                                                                        |
|---------------|-----------------------------------------|----------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 16KB on-chip  | nanoseconds                             | 1-2                                                                        | ??                                                                                                                                                          |
| 1 MB off-chip | 10s of ns                               | 5-10                                                                       | \$100/MB                                                                                                                                                    |
| 256+MB        | 10s to 100s ns                          | 10-100                                                                     | \$.5/MB                                                                                                                                                     |
| 5-50GB        | 10s of ms                               | 1,000,000                                                                  | \$.01/MB                                                                                                                                                    |
| -             | 16KB on-chip<br>1 MB off-chip<br>256+MB | 16KB on-chip nanoseconds   1 MB off-chip 10s of ns   256+MB 10s to 100s ns | Image: 10 style="text-align: center;">(compared to reg access)   16KB on-chip nanoseconds 1-2   1 MB off-chip 10s of ns 5-10   256+MB 10s to 100s ns 10-100 |

WINTER, 2001



























- Compulsory (or cold) misses: there will be a miss the first time you touch a block of main memory
- Capacity misses: the cache is not big enough to hold all the blocks you've referenced
- Conflict misses: two blocks are mapping to the same location (or set) and there is not enough room to have them in the same set at the same time.



WINTER, 2001









226

## **Replacement Policy**

- On a read miss, we bring data into the cache.
- What if there is no room?
- We need to replace an item in the cache.
- For a direct-mapped cache, we have no choice. Replace the item that maps to the same place as the one brought in.
- For set-associative caches, we have several possible policies. However the block replaced must belong to the same set as the block being brought in:
- Random
- •FIFO (replace the oldest one)
- •LRU (replace the least recently used)
- · For caches, replacement policy has little influence on performance.





**CSE378** 

WINTER, 2001

228



## What to do on a Write Miss?

- · Again, we have choices:
- Write-around -- write only in memory (aka no-fetch)
- Write-allocate -- bring data into the cache and then write it
- Note that these policies are independent of write through or write back.
- On write-allocate write-back, we we need to write back the replaced block if it is dirty.
- On write-around write-back, we still need to write-back the dirty blocks on read misses.







## Current Caches

| Micro       | On-Chip<br>(I/D) | Line<br>Size<br>(bytes) | Assoc.<br>(I/D) | Write<br>Policy | Clock<br>MHz |
|-------------|------------------|-------------------------|-----------------|-----------------|--------------|
| Alpha 21164 | 8/8              | 32                      | 1/1             | WT              | 300          |
| Alpha 21264 | 64/64            | 32                      | 2/2             | ?               | 700          |
| Power PC G4 | 32/32            | 64                      | 8/8             | WB              | 500          |
| MIPS R4400  | 16/16            | 32                      | 1/1             | WB              | 150          |
| MIPS R10000 | 32/32            | 64                      | 2/2             | WB?             | 200          |
| AMD Athlon  | 64/64            | 32                      | 2/2             | WB              | 1000+        |

Don't quote me on these numbers...

• Alpha 21164 has 96KB L2 cache on chip.

WINTER, 2001