### **Caches III**

**CSE 351 Spring 2020** 

#### **Instructor:**

**Ruth Anderson** 

### **Teaching Assistants:**

Alex Olshanskyy

Rehaan Bhimani

Callum Walker

Chin Yeoh

Diya Joy

Eric Fan

Edan Sneh

Jonathan Chen

Jeffery Tian

Millicent Li

Melissa Birchfield

**Porter Jones** 

Joseph Schafer

Connie Wang

Eddy (Tianyi) Zhou



https://what-if.xkcd.com/111/

### **Administrivia**

- Unit Summary #2 due TONIGHT, Friday (5/08)
  - Submit to Canvas TWO SEPARATE SUBMISIONS
- Lab 3 due Wednesday (5/13)
- You must log on with your @uw google account to access!!
  - Google doc for 11:30 Lecture: <a href="https://tinyurl.com/351-05-08A">https://tinyurl.com/351-05-08A</a>
  - Google doc for 2:30 Lecture: <a href="https://tinyurl.com/351-05-08B">https://tinyurl.com/351-05-08B</a>

### Making memory accesses fast!

- Cache basics
- Principle of locality
- Memory hierarchies
- Cache organization
  - Direct-mapped (sets; index + tag)
  - Associativity (ways)
  - Replacement policy
  - Handling writes
- Program optimizations that consider caches

### **Review: Direct-Mapped Cache**



### **Direct-Mapped Cache Problem**



## **Associativity**

- What if we could store data in any place in the cache?
  - More complicated hardware = more power consumed, slower
- So we combine the two ideas:
  - Each address maps to exactly one set
  - Each set can store block in more than one way







# **Cache Organization (3)**

**Note:** The textbook uses "b" for offset bits

- $\star$  Associativity (E): # of ways for each set
  - Such a cache is called an "E-way set associative cache"
  - We now index into cache *sets*, of which there are S = C/K/E
  - Use lowest  $\log_2(C/K/E) = s$  bits of block address
    - <u>Direct-mapped</u>: E = 1, so  $s = \log_2(C/K)$  as we saw previously
    - Fully associative: E = C/K, so s = 0 bits



## **Example Placement**

block size: 16 B
capacity: 8 blocks
address: 16 bits

- \* Where would data from address  $0 \times 1833$  be placed?
  - Binary: 0b 0001 1000 0011 0011

t = m - s - k  $s = \log_2(C/K/E)$   $k = \log_2(K)$ m-bit address: Tag (t) Index (s) Offset (k)

s = ?
Direct-mapped

| Set                        | Tag | Data |
|----------------------------|-----|------|
| 0                          |     |      |
| 1                          |     |      |
| 2                          |     |      |
| 1<br>2<br>3<br>4<br>5<br>6 |     |      |
| 4                          |     |      |
| 5                          |     |      |
|                            |     |      |
| 7                          |     |      |

s = ?2-way set associative

| Set | Tag | Data |
|-----|-----|------|
| 0   |     |      |
| 1   |     |      |
| 2   |     |      |
| 3   |     |      |

s = ?4-way set associative

| Set | Tag | Data |
|-----|-----|------|
| 0   |     |      |
|     |     |      |
| 1   |     |      |
|     |     |      |

# **Block Replacement**

- Any empty block in the correct set may be used to store block
- If there are no empty blocks, which one should we replace?
  - No choice for direct-mapped caches
  - Caches typically use something close to least recently used (LRU)
     (hardware usually implements "not most recently used")

|                       | Direct-mapped |      |  |
|-----------------------|---------------|------|--|
| Set                   | Tag           | Data |  |
| 0                     |               |      |  |
| 1                     |               |      |  |
| 2                     |               |      |  |
| 3                     |               |      |  |
| 4                     |               |      |  |
| 2<br>3<br>4<br>5<br>6 |               |      |  |
| 6                     |               |      |  |
| 7                     |               |      |  |

| 2-way set associative |     |      |
|-----------------------|-----|------|
| Set                   | Tag | Data |
| 0                     |     |      |
| 1                     |     |      |
| 2                     |     |      |
| 3                     |     |      |

2 way cot accociative

| . way set descendence |         |      |
|-----------------------|---------|------|
| Set                   | Tag     | Data |
|                       |         |      |
| 0                     |         |      |
| J                     | ******* |      |
|                       |         |      |
|                       |         |      |
| 1                     |         |      |
|                       |         |      |
|                       |         |      |

4-way set associative

# **Polling Question [Cache III]**

- We have a cache of size 2 KiB with block size of 128 B.
  If our cache has 2 sets, what is its associativity?
  - Vote at <a href="http://pollev.com/rea">http://pollev.com/rea</a>
  - A. 2
  - B. 4
  - **C.** 8
  - D. 16
  - E. We're lost...
- If addresses are 16 bits wide, how wide is the Tag field?

CSE351, Spring 2020

## General Cache Organization (S, E, K)



### **Notation Review**

- We just introduced a lot of new variable names!
  - Please be mindful of block size notation when you look at past exam questions or are watching videos

| Parameter          | Variable                     | Formulas                                                                              |
|--------------------|------------------------------|---------------------------------------------------------------------------------------|
| Block size         | K (B in book)                |                                                                                       |
| Cache size         | С                            | $M = 2m \wedge m = \log M$                                                            |
| Associativity      | E                            | $M = 2^{m} \leftrightarrow m = \log_{2} M$ $S = 2^{s} \leftrightarrow s = \log_{2} S$ |
| Number of Sets     | S                            | $K = 2^{\mathbf{k}} \leftrightarrow \mathbf{k} = \log_2 K$                            |
| Address space      | M                            | $C = K \times E \times S$                                                             |
| Address width      | m                            | $\mathbf{s} = \log_2(C/K/E)$                                                          |
| Tag field width    |                              | m = t + s + k                                                                         |
| Index field width  | S                            |                                                                                       |
| Offset field width | <b>k</b> ( <b>b</b> in book) |                                                                                       |

### **Example Cache Parameters Problem**

4 KiB address space, 125 cycles to go to memory. Fill in the following table:

| Cache Size        | 256 B    |
|-------------------|----------|
| <b>Block Size</b> | 32 B     |
| Associativity     | 2-way    |
| Hit Time          | 3 cycles |
| Miss Rate         | 20%      |
| Tag Bits          |          |
| <b>Index Bits</b> |          |
| Offset Bits       |          |
| AMAT              |          |

Locate set

Check if any line in set

### **Cache Read**

valid bit



K =bytes per block

CSE351, Spring 2020

## Example: Direct-Mapped Cache (E = 1)

Direct-mapped: One line per set

Block Size K = 8 B



### Example: Direct-Mapped Cache (E = 1)

Direct-mapped: One line per set

Block Size K = 8 B



### Example: Direct-Mapped Cache (E = 1)

L18: Caches III

Direct-mapped: One line per set

Block Size K = 8 B



No match? Then old line gets evicted and replaced

# Example: Set-Associative Cache (E = 2)



# Example: Set-Associative Cache (E = 2)



CSE351, Spring 2020

## Example: Set-Associative Cache (E = 2)



#### No match?

- One line in set is selected for eviction and replacement
- Replacement policies: random, least recently used (LRU), ...

### Types of Cache Misses: 3 C's!

- Compulsory (cold) miss
  - Occurs on first access to a block
- Conflict miss
  - Conflict misses occur when the cache is large enough, but multiple data objects all map to the same slot

L18: Caches III

- e.g. referencing blocks 0, 8, 0, 8, ... could miss every time
- Direct-mapped caches have more conflict misses than E-way set-associative (where E > 1)
- Capacity miss
  - Occurs when the set of active cache blocks (the working set)
    is larger than the cache (just won't fit, even if cache was fullyassociative)
  - Note: Fully-associative only has Compulsory and Capacity misses

### **Example Code Analysis Problem**

- Assuming the cache starts <u>cold</u> (all blocks invalid) and sum, i, and j are stored in registers, calculate the **miss rate**:
  - m = 12 bits, C = 256 B, K = 32 B, E = 2

```
#define SIZE 8
long ar[SIZE][SIZE], sum = 0; // &ar=0x800
for (int i = 0; i < SIZE; i++)
    for (int j = 0; j < SIZE; j++)
        sum += ar[i][j];</pre>
```