#### **Caches III**

CSE 351 Summer 2020

#### **Instructor:**

**Porter Jones** 

#### **Teaching Assistants:**

Amy Xu

Callum Walker

Sam Wolfson

Tim Mandzyuk



https://what-if.xkcd.com/111/

#### **Administrivia**

Questions doc: <a href="https://tinyurl.com/CSE351-7-31">https://tinyurl.com/CSE351-7-31</a>

- \* hw16 due Wednesday (8/5) 10:30am
- ❖ Lab 3 due Tonight (7/31) − 11:59pm
  - You get to write some buffer overflow exploits!
- Lab 4 released later today
  - All about caches!
- ❖ Unit Summary 2 Due next Wednesday (8/5) 11:59pm

#### Making memory accesses fast!

- Cache basics
- Principle of locality
- Memory hierarchies
- Cache organization
  - Direct-mapped (sets; index + tag)
  - Associativity (ways)
  - Replacement policy
  - Handling writes
- Program optimizations that consider caches

#### **Review: Cache Parameters**

- \* Block size (K): basic unit of transfer between memory and the cache, given in bytes (e.g. 64 B).
- Cache size (C): Total amount of data that can be stored in the cache, given in bytes (e.g. 32 KiB).
  - Must be multiple of block size
  - Number of blocks in cache is calculated by C/K
- Associativity (E): Number of ways blocks can be stored in a cache set, or how many blocks in each set
- \* Number of sets (S): Number of unique sets that blocks can be placed into in a cache (calculated as C/K/E).

#### Review: TIO address breakdown

TIO address breakdown:



- Index (s) field tells you where to look in cache
  - Number of bits is determined by number of sets  $(\log_2(C/K/E)) = \log_2(C/K/E)$
  - Need enough bits to reference every set in the cache
- Tag (t) field lets you check that data is the block you want
  - Rest of the bits not used for index and offset (m s k)
- Offset (k) field selects specified start byte within block
  - Number of bits is determined by block size  $(\log_2(K))$
  - Need enough bits to reference every byte in a block

#### **Review: Cache Lookup Process**

- CPU requests data at a given address
- Cache breaks down address into different bit fields
  - Determines offset, index, and tag bits
- Cache checks to see if block containing address is already in the cache
  - Uses index bits to find which set to look in
  - Uses tag bits to make sure the block in the set matches
- If block is in the cache, it's a <u>cache</u> hit
  - Data is returned to CPU starting at byte offset
- ♦ If block is not in the cache, it's a <u>cache miss</u>
  - Block is loaded from memory into the cache, evicting other blocks from the cache if necessary
  - Data is returned to CPU starting at byte offset

#### **Review: Direct-Mapped Cache**



#### **Direct-Mapped Cache Problem**



### **Associativity**

- What if we could store data in any place in the cache?
  - More complicated hardware = more power consumed, slower
- So we combine the two ideas:
  - Each address maps to exactly one set
  - Each set can store block in more than one way







## **Cache Organization (3)**

**Note:** The textbook uses "b" for offset bits

- \* Associativity (E): # of ways for each set
  - Such a cache is called an "E-way set associative cache"
  - We now index into cache sets, of which there are S = C/K/E
  - Use lowest  $\log_2(C/K/E) = s$  bits of block address
    - <u>Direct-mapped</u>: E = 1, so  $s = \log_2(C/K)$  as we saw previously
    - Fully associative: E = C/K, so s = 0 bits



### **Example Placement**



block size: 16 B

capacity: 8 blocks address: 16 bits

\* Where would data from address  $0 \times 1833$  be placed?



 $t = m - s - k \quad s = \log_2(C/K/E) \quad k = \log_2(K) = 4$ 

Index (s)

*m*-bit address:

==1, 5= C/K/E=8 s=?[012(8)=3

Tag (*t*)

 $\xi=2$   $S=\frac{L/K/E=4}{S=?log_2(4)=?}$ 

 $S = ? | oq_2(2) = 1$ 

Direct-mapped

2-way set associative

4-way set associative

| Set          | Tag | Data |
|--------------|-----|------|
| 0 00 0       |     |      |
| 00 1         |     |      |
| 0102         |     |      |
| 0113         |     |      |
| (0) 5        |     |      |
| (0) 5        |     |      |
| No 6         |     |      |
| <b>\\\</b> 7 |     |      |

| Set   | Tag   | Data |
|-------|-------|------|
| 000   |       |      |
| 0 ( 1 |       |      |
| 10 2  | ••••• |      |
| \[3   |       |      |



Offset (k)

# Block Replacement



- If there are no empty blocks, which one should we replace?
  - No choice for direct-mapped caches

Caches typically use something close to <u>least recently used (LRU)</u>
 (hardware usually implements "not most recently used")

| Direct-mapped |     |      |  |
|---------------|-----|------|--|
| Set           | Tag | Data |  |
| 0             |     |      |  |
| 1             |     |      |  |
| 2             |     |      |  |
| 3             |     |      |  |
| 4             |     |      |  |
| 5<br>6        |     |      |  |
| 6             |     |      |  |
| 7             |     |      |  |

| Set | Tag | Data |
|-----|-----|------|
| 0   |     |      |
| 1   |     |      |
| 2   |     |      |
| 3   |     |      |

2-way set associative

| i way see associative |     |          |  |
|-----------------------|-----|----------|--|
| Set                   | Tag | Tag Data |  |
|                       |     |          |  |
| 0                     |     |          |  |
|                       |     |          |  |
|                       |     |          |  |
|                       |     |          |  |
| 1                     |     |          |  |
|                       |     |          |  |

4-way set associative

here to implement in

## **Polling Question [Cache III]**



- We have a cache of size 2 KiB with block size of 128 B. If our cache has 2 sets, what is its associativity?
  - Vote at <a href="http://pollev.com/pbjones">http://pollev.com/pbjones</a> blocks = C/K = 211/27 = 24 = 16 610 CKS
  - A. 2
  - В.



- D. 16
- E. We're lost...
- If addresses are 16 bits wide, how wide is the Tag field?

blocks = E = 16/2=

### General Cache Organization (S, E, K)



#### **Notation Review**

- We just introduced a lot of new variable names!
  - Please be mindful of block size notation when you look at past exam questions or are watching videos

| Parameter          | Variable                     | Formulas                                            |
|--------------------|------------------------------|-----------------------------------------------------|
| Block size         | K (B in book)                |                                                     |
| Cache size         | С                            | $M = 2^m \leftrightarrow m = \log_2 M$              |
| Associativity      | E                            | $S = 2^{s} \leftrightarrow \mathbf{s} = \log_{2} S$ |
| Number of Sets     | S                            | $K = 2^{k} \leftrightarrow k = \log_2 K$            |
| Address space      | M                            | $C = K \times E \times S$                           |
| Address width      | m                            | $\mathbf{s} = \log_2(C/K/E)$                        |
| Tag field width    | t                            | m = t + s + k                                       |
| Index field width  | S                            |                                                     |
| Offset field width | <b>k</b> ( <b>b</b> in book) |                                                     |

(1v-2/25=23=8

#### Example Cache Parameters Problem

4 KiB address space, 125 cycles to go to memory.

Fill in the following table:

| m=logz(M)                                         |               |                 | 0 (1/1-21-4    |
|---------------------------------------------------|---------------|-----------------|----------------|
| -1 /2/42 <sup>(3)</sup>                           | Cache Size    | 256 B           | 28 C/K/E=8/2=4 |
| = log_2(22 K212) K                                | Block Size    | 32 B            | 25             |
| = 10g2(2 <sup>12</sup> ) E                        | Associativity | 2-way           | 2              |
| - 12 5:+5                                         | Hit Time      | 3 cycles        | HT             |
|                                                   | Miss Rate     | 20%             | MR             |
| m-5-4                                             | Tag Bits      | =12-2-5=5636    |                |
| < 1092(C/K/E)                                     |               | =1097(4)=2 bits |                |
| log <sub>2</sub> (C/K/E)     log <sub>2</sub> (K) | Offset Bits   | =10ge(32)=56.45 |                |
| HI+ MR*MB                                         | AMAT          | = 3+.2(125)=2   | 28 Lycles      |
|                                                   |               |                 |                |



# Example: Direct-Mapped Cache (E = 1)



### Example: Direct-Mapped Cache (E = 1)

Direct-mapped: One line per set

Block Size K = 8 B

Start reading at offset read size of (int) bytes



#### Example: Direct-Mapped Cache (E = 1)

Direct-mapped: One line per set

Block Size K = 8 B



No match? Then old line gets evicted and replaced

block from memory





### Example: Set-Associative Cache (E = 2)



#### Example: Set-Associative Cache (E = 2)



#### No match?

- One line in set is selected for eviction and replacement
- Replacement policies: random, least recently used (LRU), ...

not most recently vsed

#### Types of Cache Misses: 3 C's!

- Compulsory (cold) miss
  - Occurs on first access to a block
- Conflict miss



- Conflict misses occur when the cache is large enough, but multiple data objects all map to the same slot
  - e.g. referencing blocks 0, 8, 0, 8, ... could miss every time
- Direct-mapped caches have more conflict misses than E-way set-associative (where E > 1)
- Capacity miss
  - Occurs when the set of active cache blocks (the working set) is larger than the cache (just won't fit, even if cache was fully-associative)
  - Note: Fully-associative only has Compulsory and Capacity misses

# **Example Code Analysis Problem**

$$K = log_2(K) = 54i45$$
  
 $S = log_2(S) = 26i45$   
 $E = M - K - S = 56i45$ 

- Assuming the cache starts <u>cold</u> (all blocks invalid) and sum, i, and j are stored in registers, calculate the **miss rate**:



#### **Notes Diagrams**



