#### Caches II CSE 351 Summer 2021

Instructor: Mara Kirdani-Ryan

#### **Teaching Assistants:**

Kashish Aggarwal Nick Durand Colton Jobs Tim Mandzyuk





#### Gentle, Loving Reminders

- o hw14 due tonight! hw15 due monday!
  - No homework due Friday!
- Lab 3 due Friday (7/30)
- Unit Summary 2 Due next Monday (8/2)
  - Critique today!
  - Task #3 is out -- you'll look at some assembly

## Feedback on Unit Summaries? Come talk!

### **Learning Objectives**

Understanding this lecture means that you can:

- Explain the benefits of a memory hierarchy to someone that hasn't taken this course
- Define cache terminology:
  - Block size vs Cache size; block number vs offset
  - Sets, associativity, tags
- Differentiate between direct-mapped, associative, and fully-associative caches
- Give & receive unit summary feedback!

#### **Memory Hierarchies**

- Some fundamental and enduring properties of hardware and software systems:
  - Faster storage technologies almost always cost more per byte and have lower capacity
  - The gaps between memory technology speeds are widening
    - True for: registers ↔ cache, cache ↔ DRAM, DRAM ↔ disk, etc.
  - "Average" programs tend to exhibit good locality

#### **Memory Hierarchies**

- If you're trying to make things faster, you might end up with a <u>memory hierarchy</u>
  - For each level k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1

#### **An Example Memory Hierarchy**



#### **An Example Memory Hierarchy**



#### **An Example Memory Hierarchy**



#### Intel Core i7 Cache Hierarchy

#### Processor package



Block size: 64 bytes for all caches

#### L1 i-cache and d-cache:

32 KiB, 8-way, Access: 4 cycles

#### L2 unified cache: 256 KiB, 8-way,

Access: 11 cycles

#### L3 unified cache:

8 MiB, 16-way, Access: 30-40 cycles

# That's the memory hierarchy! Feeling ok?

### Making memory accesses fast!

- Cache basics
- Principle of locality
- Memory hierarchies
- Cache organization
  - Direct-mapped (sets; index + tag)
  - Associativity (ways)
  - Replacement policy
  - Handling writes
- Program optimizations that consider caches

## **Cache Organization (1)**

**Note:** The textbook uses "B" for block size

- → Block Size (K): unit of transfer between \$ and Mem
  - Given in bytes and always a power of 2 (e.g. 64 B)
  - Blocks consist of adjacent bytes (differ in address by 1)
    - Spatial locality!

## **Cache Organization (1)**

**Note:** The textbook uses "b" for offset bits

→ Block Size (K): unit of transfer between \$ and Mem

- Given in bytes and always a power of 2 (e.g. 64 B)
- Blocks consist of adjacent bytes (differ in address by 1)
  - Spatial locality!
- Offset field
  - Low-order log<sub>2</sub>(K) = k bits of address tell you which byte within a block
    - (address) mod  $2^n = n$  lowest bits of address
  - (address) modulo (# of bytes in a block)



#### **Checking in!**

If we have 6-bit addresses and block size K=4B, which block & byte does 0x15 refer to?



#### **Checking in!**

If we have 6-bit addresses and block size K=4B, which block & byte does 0x15 refer to?  $0x15 = 0001 \ 0101$ 

Block Num Block Offset



### **Cache Organization (2)**

- - Cache can only hold so much data (subset of next level)
  - Given in bytes (C) or number of blocks (C/K)
  - Example: C = 32 KiB = 512 blocks if using 64-B blocks
- Where should data go in the cache?
  - We need a mapping from memory addresses to specific locations in the cache to make checking the cache for an address fast
- What is a data structure that provides fast lookup?
  - Hash table!

#### **Review: Hash Tables for Fast Lookup**

#### **Insert:** 5 27 34 102 119

Apply hash function to map data to "buckets"



#### Place Data in Cache by Hashing Address



#### Place Data in Cache by Hashing Address



#### Place Data in Cache by Hashing Address



### Tags Differentiate Blocks in Same Index



#### **Checking for a Requested Address**

- CPU sends address request for chunk of data
  - Address and requested data are not the same thing!
    - Analogy: your friend ≠ their phone number
- TIO address breakdown:



- Index field tells you where to look in cache
- Tag field lets you check that data is the block you want
- Offset field selects specified start byte within block

#### Checking for a Requested Address Example

- Using 8-bit addresses.
- Cache Params: block size (K) = 4 B, cache size (C) = 32 B
   (which means number of sets is C/K = 8 sets).
  - Offset bits (k) =  $\log_2(K)$  = 2
  - Index bits (s) = log<sub>2</sub>(num sets) = 3
  - Tag bits (t) = Rest of the bits in the address = 3

*m*-bit address: Tag (*t*) Index (*s*) Offset (*k*) Block Number

- What are the fields for address 0xBA?
  - Tag bits (unique id for block): 0xBA = 1011 1010; Tag = 101
  - Index bits (cache set block maps to): Index = 110

#### Checking in, caches!

- Based on the following behavior, which of the following block sizes is NOT possible for our cache?
  - Cache starts *empty*, also known as a *cold cache*
  - Access (addr: hit/miss) stream:
    - (14<sub>10</sub>: miss), (15<sub>10</sub>: hit), (16<sub>10</sub>: miss)
- 4 bytes
- 🐈 8 bytes
- 🐑 16 bytes



#### **Direct-Mapped Cache Problem**



#### Associativity

- What if we could store data in any place in the cache?
  - More complicated hardware = more power consumed, slower
- So we *combine* the two ideas:
  - Each address maps to exactly one set
  - Each set can store block in more than one way



## **Cache Organization (3)**

#### Note: The textbook uses "b" for offset bits

- $\Rightarrow$  Associativity (*E*): # of ways for each set
  - Such a cache is called an "E-way set associative cache"
  - We now index into cache *sets*, of which there are S = C/K/E
  - Use lowest  $\log_2(C/K/E) = s$  bits of block address
    - <u>Direct-mapped</u>: E = 1, so  $s = \log_2(C/K)$  as we saw previously
    - Fully associative: E = C/K, so s = 0 bits

|     | Used for tag<br>comparison               | Selects the set               | Selects the byte from block |
|-----|------------------------------------------|-------------------------------|-----------------------------|
|     | Tag ( <mark>t</mark> )                   | Index ( <i>s</i> )            | Offset ( <b>k</b> )         |
| Dec | reasing associativity →<br>Direct mapped | → Increasing<br>associativity | Fully<br>associative        |
|     | Direct mapped<br>(only one way)          |                               | (only one set)              |

1

#### **Example Placement**

| block<br>size: | 16 B     |
|----------------|----------|
| capacity:      | 8 blocks |
| address:       | 16 bits  |

- Where would data from  $\bigcirc$ address 0x1833 be placed?
  - Binary: 0b 0001 1000 0011 0011

t = m - s - k  $s = \log_2(C/K/E)$   $k = \log_2(K)$ *m*-bit address: Offset (*k*) Index (s) Tag (*t*) **s** = ? **s** = ? **s** = ? **Direct-mapped** 2-way set associative 4-way set associative Set Tag Set Tag Data Set Tag Data Data 0 0 0 2 1 3 4 2 5 1 6 3 7

1

#### **Example Placement**

| block<br>size: | 16 B     |
|----------------|----------|
| capacity:      | 8 blocks |
| address:       | 16 bits  |

- Where would data from address 0x1833 be placed?
  - Binary: 0b 0001 1000 0011 0011

t = m - s - k  $s = \log_2(C/K/E)$   $k = \log_2(K)$ *m*-bit address: Offset (*k*) Index (s) Tag (*t*) s=3 **s** = ? **s** = ? **Direct-mapped** 2-way set associative 4-way set associative Set Tag Set Tag Data Set Tag Data Data 0 0 0 2 1 3 0x30 4 2 5 1 6 3 7

#### **Example Placement**

| block<br>size: | 16 B     |
|----------------|----------|
| capacity:      | 8 blocks |
| address:       | 16 bits  |

- Where would data from address 0x1833 be placed?
  - Binary: 0b 0001 1000 0011 0011

t = m - s - k  $s = \log_2(C/K/E)$   $k = \log_2(K)$ *m*-bit address: Offset (*k*) Index (s) Tag (*t*) s=3 s=2 **s** = ? **Direct-mapped** 2-way set associative 4-way set associative Set Tag Set Tag Set Tag Data Data Data 0 0 1 0 2 1 3 0x30 4 2 5 1 6 0x30 3 7

#### **Example Placement**

| block<br>size: | 16 B     |
|----------------|----------|
| capacity:      | 8 blocks |
| address:       | 16 bits  |

- Where would data from address 0x1833 be placed?
  - Binary: Ob 0001 1000 0011 0011

t = m - s - k  $s = \log_2(C/K/E)$   $k = \log_2(K)$ *m*-bit address: Offset (k) Index (s) Tag (*t*) s=1 s=3 s=2 **Direct-mapped** 2-way set associative 4-way set associative Set Tag Set Tag Set Tag Data Data Data 0 0 1 0 2 1 3 0x30 4 0x30 2 5 1 6 0x30 3 7

#### Summary

- Memory hierarchy between caches (multi-level), memory, disk
  - Like any other storage, short-term/long-term
- Caches are organized into blocks by hashing addresses
  - Store tag to avoid confusing data

## **Unit Summary Critique**

#### Critique

- 3 people/breakout!
  - Try to make sure everyone has time to share
  - Helpful feedback includes things that you like and don't like!
  - No need to be *critical*
- These are personal representations of knowledge!
  - You're allowed to ignore the feedback that you get
  - But, do make sure to listen!

#### **Direct-Mapped Cache**



#### **Direct-Mapped Cache Problem**



#### **Notes Diagrams**



