#### Caches III

CSE 351 Autumn 2024

#### Instructor:

**Ruth Anderson** 

#### **Teaching Assistants:**

Alexandra Michael Connie Chen Chloe Fong Chendur Jayavelu Joshua Tan Nikolas McNamee Nahush Shrivatsa Naama Amiel Neela Kausik Renee Ruan Rubee 7hao Samantha Dreussi Sean Siddens Waleed Yagoub



https://what-if.xkcd.com/111/

## **Relevant Course Information**

- HW16 due TONIGHT, Wednesday (11/06) @ 11:59 pm
- Lab 3 due Mon 11/11
  - Encouraged to aim for Fri 11/08, actual deadline Mon 11/11
  - You have everything you need to do the lab as of 10/28
  - Last part of HW15 is useful for Lab 3
- HW17 due Friday (11/08) @ 11:59 pm
- Mid-quarter Survey due Saturday (11/09)
- HW18 due Wednesday (11/13) @ 11:59 pm

# Making memory accesses fast!

- Cache basics
- Principle of locality
- Memory hierarchies
- Cache organization
  - Direct-mapped (sets; index + tag)
  - Associativity (ways)
  - Replacement policy
  - Handling writes
- Program optimizations that consider caches

#### **Reading Review**

- Terminology:
  - Associativity: sets, fully-associative cache
  - Replacement policies: least recently used (LRU)
  - Cache line: cache block + management bits (valid, tag)
  - Cache misses: compulsory, conflict, capacity

#### **Review: Direct-Mapped Cache**



Oh

# Direct-Mapped Cache Problem



# **Associativity: A Solution!**

- What if we could store any data in any place in the cache?
  - More complicated hardware = more power consumed, slower
- So we combine the two ideas:
  - Each address maps to exactly one set
  - Each set can store block in more than one way within the set



# **Cache Associativity (***E***)**

**Note:** The textbook uses "b" for offset bits

- \* Associativity (E): number of ways to store in each set
  - Such a cache is called an "E-way set associative cache"
  - We now index into cache *sets*, of which there are S = C/K/E
  - Use lowest  $\log_2(C/K/E) = s$  bits of block address
    - <u>Direct-mapped</u>: E = 1, so  $s = \log_2(C/K)$  as we saw previously





| Example | Placement |
|---------|-----------|
|---------|-----------|

| block size: | 16 B     |
|-------------|----------|
| capacity:   | 8 blocks |
| address:    | 16 bits  |

Offset (k)

Where would data from address 0x1833 be placed?

t = m - s - k  $s = \log_2(C/K/E)$   $k = \log_2(K)$ 

Index (S)

Binary: 0b 0001 1000 0011 0011

Tag (*t*)

*m*-bit address:

**s** = ? **s** = ? **s** = ? 2-way set associative **Direct-mapped** 4-way set associative Set Tag Data Set Tag Data Set Tag Data (ပယ) 0 (000)(001) 1 (0)0 (010) 2 (01)1 (UII) 3 (၂သ) 4 (10)2(101) 5 (1)1 (110) 6 $\checkmark$ (II)3 (III) 7

# **Block Placement and Replacement**

- Any empty block in the correct set may be used to store block
  - Valid bit for each cache block indicates if data is valid (1) or mystery (0) data
- If there are no empty blocks, which one should we <u>replace</u>?
  - No choice for direct-mapped caches
  - Caches typically use something close to <u>least recently used (LRU)</u> (hardware usually implements "not most recently used")



## **Polling Questions**

 $K=2^7 B$  $rac = 2^{11} B$ ✤ We have a cache of size 2 KiB with block size of 128 B. If our cache has 2 sets, what is its associativity? cache holds  $C/K = 2^{11-7} = 7^{4} = 16$  blocks Vote in Ed Lessons 1 block A. 2 S= C/K/E set O **B**. **4** E = (C/K)/Seach set has **C.** 8 8 blocks, so F=8 = 16/2 = 8 cache size **D.** 16 set 1 E. We're lost... m=16 <--If addresses are 16 bits wide, how wide is the Tag field?  $k = \log_2(K) = 7 \text{ bits}, s = \log_2(S) = 1 \text{ bits}, t = m - s - k = 8 \text{ bits}$ 



#### **Notation Review**

- We just introduced a lot of new variable names!
  - Please be mindful of block size notation when you look at past exam questions or are watching videos

| Parameter          | Variable                     | Formulas                                       |
|--------------------|------------------------------|------------------------------------------------|
| Block size         | K (B in book)                |                                                |
| Cache size         | С                            | $M = 2^{m} \leftrightarrow m = \log_2 M$       |
| Associativity      | Ε                            | $S = 2^{s} \leftrightarrow s = \log_2 S$       |
| Number of Sets     | S                            | $K = 2^k \leftrightarrow k = \log_2 K$         |
| Address space      | М                            | $C = K \times E \times S$                      |
| Address width      | m                            | $c = K \land E \land S$<br>$s = \log_2(C/K/E)$ |
| Tag field width    | t                            | m = t + s + k                                  |
| Index field width  | S                            |                                                |
| Offset field width | <b>k</b> ( <b>b</b> in book) |                                                |

#### Example Cache Parameters Problem 2'° Bytes en 10 bits Address is 10 bits unde

\* 1 KiB address space, 125 cycles to go to memory. Fill in the following table:  $2^6 = 7^3 \text{ block }^5$ 



#### L18: Caches III



#### **Example: Direct-Mapped Cache (**E = 1) (step 1)



#### Example: Direct-Mapped Cache (E = 1) (step 2)

Direct-mapped: One line per set Block Size K = 8 B



- 1) Locate set
- 2) Check if any line in set is <u>valid</u> and has <u>matching tag: hit</u>
- 3) Locate data starting at offset

## Example: Direct-Mapped Cache (E = 1) (step 3)



No match? Then old line gets evicted and replaced

3) Locate data starting at <u>offset</u>

#### **Example: Set-Associative Cache (***E* = 2**)** (step 1)



1) Locate set

## **Example: Set-Associative Cache (**E = 2**)** (step 2)



#### block offset

- 1) Locate set
- 2) Check if any line in set is <u>valid</u> and has <u>matching tag: hit</u>
- 3) Locate data starting at offset

## **Example: Set-Associative Cache (**E = 2**)** (step 3)



#### No match?

- One line in set is selected for eviction and replacement
- Replacement policies: random, least recently used (LRU), ...
- 2) Check if any line in set is valid and has matching tag: hit
- 3) Locate data starting at <u>offset</u>

## **Types of Cache Misses: 3 C's!**

- Compulsory (cold) miss
  - Occurs on first access to a block
- Conflict miss
  - Conflict misses occur when the cache is large enough, but multiple data objects all map to the same slot
    - e.g. referencing blocks 0, 8, 0, 8, ... could miss every time
  - Direct-mapped caches have more conflict misses than *E*-way set-associative (where *E* > 1)
- Capacity miss
  - Occurs when the set of active cache blocks (the *working set*) is larger than the cache (just won't fit, even if cache was *fully-associative*)
  - **Note:** *Fully-associative* only has Compulsory and Capacity misses

#### Example Code Analysis Problem

Assuming the cache starts <u>cold</u> (all blocks invalid) and sum, i, \* and j are stored in registers, calculate the miss rate: s = 2 bits, k = 3 bits 25% • m = 12 bits, C = 256 B, K = 32 B, E = 28 bytes #define SIZE 8 [ ong ar[SIZE] [SIZE], sum = 0; // (ar=0x800)for (int i = 0; i < SIZE; i++) **for** (**int** <u>j</u> = 0; <u>j</u> < SIZE; <u>j</u>++) sum += ar[i][j]; tag index offset 4 longs address Ob 1000 ar[0][0] 0000 05 1000 block: 0000 art0][2] -1000 6 0000-Slightly more complex example posted in the video link