## Memory & Caches II

CSE 351 Autumn 2022

#### **Instructor:**

Justin Hsia

#### **Teaching Assistants:**

Angela Xu

Arjun Narendra

**Armin Magness** 

Assaf Vayner

Carrie Hu

Clare Edmonds

David Dai

Dominick Ta

Effie Zheng

James Froelich

Jenny Peng

Kristina Lansang

**Paul Stevans** 

Renee Ruan

Vincent Xiao



https://what-if.xkcd.com/111/

#### **Relevant Course Information**

- Mid-quarter Survey due Wednesday (11/9)
- hw16 due Wednesday (11/9)
- hw17 due next Wednesday (11/17)
  - Don't wait too long, this is a BIG hw (includes this lecture)
- Lab 3 due Friday (11/11)
  - Veteran's Day: no lecture, but some office hours (see Ed)
- Midterm grades will be released when we can
  - Regrade requests will be available afterward

## **Memory Hierarchies (Review)**

- Some fundamental and enduring properties of hardware and software systems:
  - Faster storage technologies almost always cost more per byte and have lower capacity
  - The gaps between memory technology speeds are widening
  - Well-written programs tend to exhibit good locality
- These properties complement each other beautifully
  - They suggest an approach for organizing memory and storage systems known as a <u>memory hierarchy</u>
    - For each level k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1

## **An Example Memory Hierarchy**



### **An Example Memory Hierarchy**



### **Intel Core i7 Cache Hierarchy**

#### **Processor package**



#### **Block size:**

64 bytes for all caches

#### L1 i-cache and d-cache:

32 KiB, 8-way, Access: 4 cycles

#### L2 unified cache:

256 KiB, 8-way, Access: 11 cycles

#### L3 unified cache:

8 MiB, 16-way,

Access: 30-40 cycles

### Making memory accesses fast!

- Cache basics
- Principle of locality
- Memory hierarchies
- Cache organization
  - Direct-mapped (sets; index + tag)
  - Associativity (ways)
  - Replacement policy
  - Handling writes
- Program optimizations that consider caches

### **Reading Review**

- Terminology:
  - Memory hierarchy
  - Cache parameters: block size (K), cache size (C)
  - Addresses: block offset field (k bits wide)
  - Cache organization: direct-mapped cache, index field
- Questions from the Reading?

### **Review Questions**

- We have a direct-mapped cache with the following parameters:
  - Block size of 8 bytes  $K = 2^3 B$
  - Cache size of 4 KiB  $C = 2^{12} B$
- \* How many blocks can the cache hold?  $C/K = 2^{12-3} = 2^{1} = 5/2$  blocks
- ♦ How many bits wide is the block offset field? 

  k=log₂(k)=3 bits

  line
- Which of the following addresses would fall under block number 3?

**Note:** The textbook uses "B" for block size

- $\bullet$  Block Size (K): unit of transfer between \$ and Mem
  - Given in bytes and always a power of 2 (e.g., 64 B)
  - Blocks consist of adjacent bytes (differ in address by 1)
    - Spatial locality!
  - Small example (K = 4 B):



**Note:** The textbook uses "B" for block size

- $\bullet$  Block Size (K): unit of transfer between \$ and Mem
  - Given in bytes and always a power of 2 (e.g., 64 B)
  - Blocks consist of adjacent bytes (differ in address by 1)
    - Spatial locality!



**Note:** The textbook uses "b" for offset bits

- $\bullet$  Block Size (K): unit of transfer between \$ and Mem
  - Given in bytes and always a power of 2 (e.g., 64 B)
  - Blocks consist of adjacent bytes (differ in address by 1)
    - Spatial locality!

 $\times$  % 2 = value of the lowest n bits

Offset field

64 6



- Low-order  $log_2(K) = k$  bits of address tell you which byte within a block
  - (address)  $mod 2^n = n$  lowest bits of address
- (address) modulo (# of bytes in a block)

need to specify every byte in a block?

m-k bits k bits m-bit address: Block Number Block Offset

(refers to byte in memory)

**Note:** The textbook uses "b" for offset bits

- $\bullet$  Block Size (K): unit of transfer between \$ and Mem
  - Given in bytes and always a power of 2 (e.g., 64 B)
  - Blocks consist of adjacent bytes (differ in address by 1)
    - Spatial locality!

#### Example:

• If we have 6-bit addresses and block size K = 4 B, which block and byte does  $0 \times 15$  refer to?

address: Ob 
$$O \perp O \perp O \perp$$
 block number 5:

where 5:

offset width =  $log_2(K) = log_2(4) = 2 \text{ hts}$ 

- Cache Size (C): amount of data the \$ can store
  - Cache can only hold so much data (subset of next level)
  - Given in bytes (C) or number of blocks (C/K)
  - Example:  $C = 3\overset{\checkmark}{2}$  KiB = 512 blocks if using 64-B blocks  $2^{5} \times 2^{10} = 2^{15} B \times \frac{1 \text{ block}}{2^{6} R} = 2^{9} \text{ blocks}$
- Where should data go in the cache?
  - We need a mapping from memory addresses to specific locations in the cache to make checking the cache for an address fast
- What is a data structure that provides fast lookup?
  - Hash table!

## **Hash Tables for Fast Lookup**



Place Data in Cache by Hashing Address



## Place Data in Cache by Hashing Address



### **Polling Question**

- \* 6-bit addresses, block size K = 4 B, and our cache holds S = 4 blocks. = C/K ,  $s = log_2(4) = 2$  bits
- A request for address 0x2A results in a cache miss. Which index does this block get loaded into and which 3 other addresses are loaded along with it?
  - Vote on Ed Lessons



CSE351, Autumn 2022

## Place Data in Cache by Hashing Address



## **Tags Differentiate Blocks in Same Index**



### **Checking for a Requested Address**

- CPU sends address request for chunk of data
  - Address and requested data are not the same thing!
    - Analogy: your friend ≠ their phone number
- TIO address breakdown:



- Index field tells you where to look in cache
- Tag field lets you check that data is the block you want
- Offset field selects specified start byte within block
  - Note: t and s sizes will change based on hash function

# Cache Puzzle Example (No Voting)

- Based on the following behavior, which of the following block sizes is NOT possible for our cache?
  - Cache starts empty, also known as a cold cache

Access (addr: hit/miss) stream: hit: block with data already in \$ miss: data not in \$, pulls block containing data
 (14: miss). (15: hit). (16: miss)

• (14: miss), (15: hit), (16: miss)

-3 16 is in a different block

L>2 14 \$15 are in the same block

A. 4 bytes

B. 8 bytes

C. 16 bytes

D. 32 bytes

E. We're lost...



### **Summary: Direct-Mapped Cache**



## **Direct-Mapped Cache Problem**

