

#### Administrative

- \* Midterm regrade requests due today
- Lab 3 due today!
- HW 4 out, due Friday, February 23
- \* No lecture on Monday President's Day!
  - OH also cancelled

#### Making memory accesses fast!

- Cache basics
- Principle of locality
- Memory hierarchies
- Cache organization
  - Direct-mapped (sets; index + tag)
  - Associativity (ways)
  - Replacement policy
  - Handling writes
- \* Program optimizations that consider caches









- No-write-allocate: ("write around") just write immediately to memory
- Typical caches:
  - Write-back + Write-allocate, usually
  - Write-through + No-write-allocate, occasionally















# **Optimizations for the Memory Hierarchy**

- Write code that has locality!
  - Spatial: access data contiguously
  - Temporal: make sure access to the same data is not too far apart in time
- \* How can you achieve locality?
  - Adjust memory accesses in *code* (software) to improve miss rate (MR)
    - Requires knowledge of *both* how caches work as well as your system's parameters
  - Proper choice of algorithm
  - Loop transformations



























#### Anatomy of a Cache Question

- \* Cache questions come in a few flavors:
  - 1) TIO Address Breakdown
  - 2) For fixed cache parameters, analyze the performance of the given code/sequence
  - 3) For given code/sequence, how does changing your cache parameters affect performance?
  - 4) Average Memory Access Time (AMAT)

#### **Example Cache Parameters Problem**

1 MB address space, 125 cycles to go to memory.
 Fill in the following table:

| Cache Size                | 4 KB               |
|---------------------------|--------------------|
|                           |                    |
| Block Size                | 16 B               |
| Associativity             | 4-way              |
| Hit Time                  | 3 cycles           |
| Miss Rate                 | 20%                |
| Write Policy              | Write-through      |
| <b>Replacement Policy</b> | LRU                |
| Tag Bits                  | 10                 |
| Index Bits                | 6                  |
| Offset Bits               | 4                  |
| AMAT                      | AMAT =             |
|                           | 3 + 0.2 * 125 = 28 |

#### **Peer Instruction Question**

- We have a cache of size 2 KB with block size of 128 B.
   If our cache has 2 sets, what is its associativity?
  - A. 2
  - B. 4
  - C. 8
  - D. 16
  - E. We're lost...
- If addresses are 16 bits wide, how wide is the Tag field?

# **Peer Instruction Question**

- \* Which of the following cache statements is FALSE?
  - A. We can reduce compulsory misses by decreasing our block size
  - B. We can reduce conflict misses by increasing associativity
  - C. A write-back cache will save time for code with good temporal locality on writes
  - D. A write-through cache will always match data with the memory hierarchy level below it
  - E. We're lost...

### **Example Code Analysis Problem**

```
    Assuming the cache starts <u>cold</u> (all blocks invalid),
calculate the miss rate for the following loop:
```

```
m = 20 bits, C = 4 KB, B = 16 B, E = 4
```

# Suggested Problems

- CS:APP 3<sup>rd</sup>
- Practice Problems 6.12-15
- AU16 Final Question F5

# Learning About Your Machine

- Linux:
  - lscpu
  - Is /sys/devices/system/cpu/cpu0/cache/index0/
    - Ex: cat /sys/devices/system/cpu/cpu0/cache/index\*/size
- Windows:
  - wmic memcache get <query> (all values in KB)
  - <u>Ex</u>: wmic memcache get MaxCacheSize
- \* Modern processor specs: <u>http://www.7-cpu.com/</u>