### **Caches III**

CSE 351 Spring 2019

#### **Instructor:**

**Ruth Anderson** 

#### **Teaching Assistants:**

Gavin Cai
Jack Eggleston
John Feltrup
Britt Henderson
Richard Jiang
Jack Skalitzky
Sophie Tian
Connie Wang
Sam Wolfson
Casey Xing
Chin Yeoh



https://what-if.xkcd.com/111/

### **Administrivia**

- Lab 3, due Wednesday (5/15)
- Homework 4, due Wed (5/22) (Structs, Caches)

# Making memory accesses fast!

- Cache basics
- Principle of locality
- Memory hierarchies
- Cache organization
  - Direct-mapped (sets; index + tag)
  - Associativity (ways)
    - Replacement policy
    - Handling writes
- Program optimizations that consider caches

## **Direct-Mapped Cache**



## **Direct-Mapped Cache Problem**



# **Associativity**

- What if we could store data in any place in the cache?
  - More complicated hardware = more power consumed, slower
- So we combine the two ideas:
  - Each address maps to exactly one set
  - Each set can store block in more than one way



# **Cache Organization (3)**

**Note:** The textbook uses "b" for offset bits

- $\star$  Associativity (E): # of ways for each set
  - Such a cache is called an "E-way set associative cache"
  - We now index into cache sets, of which there are S = C/K/E
  - Use lowest  $\log_2(C/K/E) = s$  bits of block address
    - <u>Direct-mapped</u>: E = 1, so  $s = \log_2(C/K)$  as we saw previously
    - Fully associative: E = C/K, so s = 0 bits



# **Example Placement**

block size: 16 B 8 blocks capacity: address: 16 bits

- Where would data from address  $0 \times 1833$  be placed?
  - Binary: 0b 0001 1000 0011 0011

$$t = m - s - k$$
  $s = \log_2(C/K/E)$   $k = \log_2(K) - 4$ 

*m*-bit address:

| Tag ( <i>t</i> ) | Index (s) | Offse |
|------------------|-----------|-------|
| Tag ( $t$ )      | Index (s) | Ottse |





| Set    | Tag | Data |
|--------|-----|------|
| 000 0  |     |      |
| 00 N 1 |     |      |
| 610 2  |     |      |
| 0113   | }   |      |
| 100 4  |     |      |
| 1015   |     |      |
| ا ای 6 |     |      |
| 1, 7   |     |      |





# **Block Replacement**

- Any empty block in the correct set may be used to store block
- If there are no empty blocks, which one should we replace?
  - No choice for direct-mapped caches
  - Caches typically use something close to least recently used (LRU)
     (hardware usually implements "not most recently used")

Direct-mapped

| Set                        | Tag | Data |
|----------------------------|-----|------|
| 0                          |     |      |
| 1                          |     |      |
| 2                          |     |      |
| 1<br>2<br>3<br>4<br>5<br>6 |     |      |
| 4                          |     |      |
| 5                          |     |      |
| 6                          |     |      |
| 7                          |     |      |

2-way set associative

| Set | Tag | Data |
|-----|-----|------|
| 0   |     |      |
| 1   |     |      |
| 2   |     |      |
| 3   |     |      |

4-way set associative

| Set | Tag | Data |
|-----|-----|------|
|     |     |      |
| 0   |     |      |
|     |     |      |
|     |     |      |
|     |     |      |
| 1   |     |      |
|     |     |      |
|     |     |      |

K=27 B

## **Peer Instruction Question**



• We have a cache of size 2 KiB with block size of 128 B.
If our cache has 2 sets, what is its associativity?

■ Vote at <a href="http://pollev.com/rea">http://pollev.com/rea</a> cache holds C/K=2<sup>11-4</sup>=2<sup>4</sup>=16 blocks



B. 4

C. 8

each set has 8 blocks, so E=8

D. 16

E. We're lost...



\* If addresses are 16 bits wide, how wide is the Tag field?  $k = log_2(K) = 7 bits$ ,  $s = log_2(5) = 1 bit$ , t = m - s - k = 8 bits

# associativity General Cache Organization (S, E, K)



#### **Notation Review**

- We just introduced a lot of new variable names!
  - Please be mindful of block size notation when you look at past exam questions or are watching videos

| Variable           | This Quarter                 | Formulas                                                                      |
|--------------------|------------------------------|-------------------------------------------------------------------------------|
| Block size         | K (B in book)                |                                                                               |
| Cache size         | С                            | $M = 2^m \wedge m = \log M$                                                   |
| Associativity      | E                            | $M = 2^m \leftrightarrow m = \log_2 M$ $S = 2^s \leftrightarrow s = \log_2 S$ |
| Number of Sets     | S                            | $K = 2^{k} \leftrightarrow k = \log_2 K$                                      |
| Address space      | М                            | $C = K \times E \times S$                                                     |
| Address width      | m                            | $\mathbf{s} = \log_2(C/K/E)$                                                  |
| Tag field width    | t                            | $m = \frac{t}{t} + s + k$                                                     |
| Index field width  | S                            |                                                                               |
| Offset field width | <b>k</b> ( <b>b</b> in book) |                                                                               |

## **Example Cache Parameters Problem**

$$\Rightarrow 2^{12} B \iff m = 12 b + s$$
 MP

❖ 4 KiB address space, 125 cycles to go to memory. Fill in the following table:

| C                | Cache Size        | 256 B                          | 2 8      |
|------------------|-------------------|--------------------------------|----------|
| $\mathcal{K}$    | Block Size        | 32 B                           | 25       |
| E                | Associativity     | 2-way                          | 2'       |
| НТ               | Hit Time          | 3 cycles                       |          |
| MR               | Miss Rate         | 20%                            |          |
| t=m-s-k          | Tag Bits          | 5                              |          |
| s=logz((/K/E)    | <b>Index Bits</b> | 2                              | 28/25/21 |
| $k = loy_2(k)$   | Offset Bits       | 5                              |          |
| AMAT= HT +MR *MP | AMAT              | 3+0.2(125)= 28 clock<br>cycles |          |

Locate set

2) Check if any line in set

### **Cache Read**



# Example: Direct-Mapped Cache (E = 1)

Direct-mapped: One line per set

Block Size K = 8 B



# Example: Direct-Mapped Cache (E = 1)

Direct-mapped: One line per set

Block Size K = 8 B



# Example: Direct-Mapped Cache (E = 1)

Direct-mapped: One line per set

Block Size K = 8 B



No match? Then old line gets evicted and replaced

no unnecessary extra cache accesses across block boundaries

want alignment!

# Example: Set-Associative Cache (E = 2)



# Example: Set-Associative Cache (E = 2)



# Example: Set-Associative Cache (E = 2)



#### No match?

- One line in set is selected for eviction and replacement
- Replacement policies: random, least recently used (LRU), ...

## Types of Cache Misses: 3 C's!

- Compulsory (cold) miss
  - Occurs on first access to a block
- Conflict miss
  - Conflict misses occur when the cache is large enough, but multiple data objects all map to the same slot
    - e.g. referencing blocks 0, 8, 0, 8, ... could miss every time
  - Direct-mapped caches have more conflict misses than E-way set-associative (where E > 1)
- Capacity miss
  - Occurs when the set of active cache blocks (the working set)
    is larger than the cache (just won't fit, even if cache was fullyassociative)
  - Note: Fully-associative only has Compulsory and Capacity misses

# **Example Code Analysis Problem**

\* Assuming the cache starts <u>cold</u> (all blocks invalid) and sum is stored in a register, calculate the **miss rate**: 1/4

Challenge: what is the miss rate if we switch the ordering of the for-loops?

#### What about writes?

- Multiple copies of data exist:
  - L1, L2, possibly L3, main memory
- \* What to do on a write-hit? (block/data already in \$)
  - Write-through: write immediately to next level
  - Write-back: defer write to next level until line is evicted (replaced)
- \* What to do on a write-miss? (block /data not currently in \$)
  - Write-allocate: ("fetch on write") load into cache, update line in cache
    - Good if more writes or reads to the location follow
  - next level No-write-allocate: ("write around") just write immediately to memory
- Typical caches:
  - Write-back + Write-allocate, usually
  - Write-through + No-write-allocate, occasionally