#### **Caches II**

CSE 351 Summer 2020

#### **Instructor:**

Porter Jones

#### **Teaching Assistants:**

Amy Xu Callum Walker Sam Wolfson Tim Mandzyuk



#### **Administrivia**

Questions doc: <a href="https://tinyurl.com/CSE351-7-29">https://tinyurl.com/CSE351-7-29</a>

- ♦ hw15 due Friday (7/31) 10:30am
- No homework due Monday!
- ❖ Lab 3 due Friday (7/31) − 11:59pm
  - You get to write some buffer overflow exploits!
- ❖ Unit Summary 2 Due next Wednesday (8/5) 11:59pm

### **Memory Hierarchies**

- Some fundamental and enduring properties of hardware and software systems:
  - Faster storage technologies almost always cost more per byte and have lower capacity
  - The gaps between memory technology speeds are widening
  - Well-written programs tend to exhibit good locality
- These properties complement each other beautifully
  - They suggest an approach for organizing memory and storage systems known as a <u>memory hierarchy</u>
    - For each level k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1

### **An Example Memory Hierarchy**



### **An Example Memory Hierarchy**



### **An Example Memory Hierarchy**



#### W UNIVERSITY of WASHINGTON

# Intel Core i7 Cache Hierarchy

Processor package



Main memory

#### **Block size:**

64 bytes for all caches

#### L1 i-cache and d-cache:

32 KiB, 8-way, Access: 4 cycles

#### L2 unified cache:

256 KiB, 8-way, Access: 11 cycles

#### L3 unified cache:

8 MiB, 16-way,

Access: 30-40 cycles

### Making memory accesses fast!

- Cache basics
- Principle of locality
- Memory hierarchies
- Cache organization
  - Direct-mapped (sets; index + tag)
  - Associativity (ways)
  - Replacement policy
  - Handling writes
- Program optimizations that consider caches

# **Cache Organization (1)**

**Note:** The textbook uses "B" for block size

- ❖ Block Size (K): unit of transfer between \$ and Mem
  - Given in bytes and always a power of 2 (e.g. 64 B)
  - Blocks consist of adjacent bytes (differ in address by 1)
    - Spatial locality!



# **Cache Organization (1)**

**Note:** The textbook uses "b" for offset bits

- $\bullet$  Block Size (K): unit of transfer between \$ and Mem
  - Given in bytes and always a power of 2 (e.g. 64 B)
  - Blocks consist of adjacent bytes (differ in address by 1)
     Spatial locality!

    X% 2 = value of lowest n bits
    - Spatial locality!

- Offset field
- 64 26 06... 7 7 7 7 7 7 7 2 2° • Low-order  $\log_2(K) = k$  bits of address tell you which byte
  - within a block
  - (address) mod  $2^n = n$  lowest bits of address
  - (address) modulo (# of bytes in a block)

How many bits d. I need to spacify every byte in a block?

m-k bits **k** bits **Block Number Block Offset** *m*-bit address: (refers to byte in memory)

### Polling Question [Cache II-a]

- ❖ If we have 6-bit addresses and block size K = 4 B, which block and byte does 0x15 refer to?
  - Vote at: <a href="http://pollev.com/pbjones">http://pollev.com/pbjones</a>

|    | Block Num   | Block Offset | 08 1                                  | 5      | 01     |
|----|-------------|--------------|---------------------------------------|--------|--------|
| A. | 1           | 1            | 89 D T                                |        | of set |
| В. | 1           | 5            |                                       | SM     |        |
| C. | 5           | 1            | · · · · · · · · · · · · · · · · · · · | (X).   | 1      |
| D. | 5           | 5            | offset width(K)=log                   | 0924)= | 26:43  |
| E. | We're lost. | ••           |                                       |        |        |

# **Cache Organization (2)**

in bytes

- Cache Size (C): amount of data the \$ can store
  - Cache can only hold so much data (subset of next level)
  - Given in bytes (C) or number of blocks (C/K)
  - Example: C = 32 KiB = 512 blocks if using 64-B blocks
- Where should data go in the cache?
  - We need a mapping from memory addresses to specific locations in the cache to make checking the cache for an address fast
- What is a data structure that provides fast lookup?
  - Hash table!

### **Review: Hash Tables for Fast Lookup**





#### Place Data in Cache by Hashing Address



### Place Data in Cache by Hashing Address



#### **Practice Question**

- \* 6-bit addresses, block size K = 4 B, and our cache holds S = 4 blocks. = C/K (C = 168) with = 5 = 19
- A request for address 0x2A results in a cache miss. Which set index does this block get loaded into and which 3 other addresses are loaded along with it?
  - No voting for this question



### Place Data in Cache by Hashing Address



### **Tags Differentiate Blocks in Same Index**



### **Checking for a Requested Address**

- CPU sends address request for chunk of data
  - Address and requested data are not the same thing!
    - Analogy: your friend ≠ their phone number
- TIO address breakdown:



- Index field tells you where to look in cache
- Tag field lets you check that data is the block you want
- Offset field selects specified start byte within block
- Note: t and s sizes will change based on hash function

# **Checking for a Requested Address Example**

- Using 8-bit addresses.
- ❖ Cache Params: block size (K) = 4 B, cache size (C) = 32 B (which means number of sets is C/K = 8 sets).
  - Offset bits (k) =  $\log_2(K) = \log_2(4) = 26/45$
  - Index bits (s) =  $\log_2(num\ sets) = \log_2(8) = 3$
  - Tag bits (t) = Rest of the bits in the address =  $\mathbb{Z} 2 3 = 3$

m-bit address: Tag (t) Index (s) Offset (k)

Block Number

- What are the fields for address 0xBA?
  - Tag bits (unique id for block): っししっこう
  - Index bits (cache set block maps to): oblice 6
  - Offset bits (byte offset within block): ▷ b lo = ≥

21

#### Cache Puzzle [Cache II-b] Vote at <a href="http://pollev.com/pbjones">http://pollev.com/pbjones</a>

- Based on the following behavior, which of the following block sizes is NOT possible for our cache?
  - Cache starts empty, also known as a cold cache



### **Direct-Mapped Cache Problem**



### **Associativity**

- What if we could store data in any place in the cache?
  - More complicated hardware = more power consumed, slower
- So we combine the two ideas:
  - Each address maps to exactly one set
  - Each set can store block in more than one way









# **Cache Organization (3)**

Note: The textbook uses "b" for offset bits

- \* Associativity (E): # of ways for each set
  - Such a cache is called an "E-way set associative cache"
  - We now index into cache sets, of which there are  $S = \overline{C/K}/\overline{E}$
  - Use lowest  $\log_2(C/K/E) = s$  bits of block address
    - <u>Direct-mapped</u>: E = 1, so  $s = \log_2(C/K)$  as we saw previously
    - Fully associative: E = C/K, so s = 0 bits



## **Example Placement**

block size: 16 B 5 capacity: 8 blocks address: 16 bits

\* Where would data from address  $0 \times 1833$  be placed?

Binary: 0b 0001 1000

 $\log_2(C/K/E) \ \mathbf{k} = \log_2(K)$ Offset (k)Tag (*t*) Index (s)

*m*-bit address:

1092(8) s = ? 3 6/45 **Direct-mapped** 

|       | Set | Tag | Data     |
|-------|-----|-----|----------|
| 0 0 0 | 0   |     |          |
| 001   | 1   |     |          |
| 9 (9  | 2   |     |          |
| 011   | 3   |     | <b>\</b> |
| 100   | 4   |     |          |
| 101   | 5   |     |          |
| (10   | 6   |     |          |
| (1)   | 7   |     |          |

| (0)2(4)                             |
|-------------------------------------|
| S = ? ていけら<br>2-way set associative |

|     | Set | Tag | Data |
|-----|-----|-----|------|
| ອງ  | 0   |     |      |
| 0 ( | 1   |     |      |
| ()  | 2   |     |      |
| 11  | 3   |     |      |

|       | logz  | (z   | -)      |
|-------|-------|------|---------|
|       |       | •    | 16:4    |
| 4-way | set a | ISSC | ciative |

| Set | Tag | Data |
|-----|-----|------|
| 0   |     |      |
| 1   |     |      |

### **Direct-Mapped Cache**



#### **Direct-Mapped Cache Problem**



#### **Notes Diagrams**

 $\mathbf{W}$  UNIVERSITY of WASHINGTON



