#### **Caches II** W UNIVERSITY of WASHINGTON **CSE 351 Spring 2020** #### **Instructor:** **Ruth Anderson** #### **Teaching Assistants:** Alex Olshanskyy Rehaan Bhimani Callum Walker Chin Yeoh Diya Joy Eric Fan Edan Sneh Jonathan Chen Jeffery Tian Millicent Li Melissa Birchfield **Porter Jones** Joseph Schafer Connie Wang Eddy (Tianyi) Zhou #### **Administrivia** - Unit Summary #2 due Friday (5/08) - Lab 3 due Wednesday (5/13) - You must log on with your @uw google account to access!! - Google doc for 11:30 Lecture: <a href="https://tinyurl.com/351-05-06A">https://tinyurl.com/351-05-06A</a> - Google doc for 2:30 Lecture: <a href="https://tinyurl.com/351-05-06B">https://tinyurl.com/351-05-06B</a> # **An Example Memory Hierarchy** # **Memory Hierarchies** - Some fundamental and enduring properties of hardware and software systems: - Faster storage technologies almost always cost more per byte and have lower capacity - The gaps between memory technology speeds are widening - True for: registers ↔ cache, cache ↔ DRAM, DRAM ↔ disk, etc. - Well-written programs tend to exhibit good locality - These properties complement each other beautifully - They suggest an approach for organizing memory and storage systems known as a <u>memory hierarchy</u> - For each level k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1 # **An Example Memory Hierarchy** # **An Example Memory Hierarchy** # **Intel Core i7 Cache Hierarchy** #### **Processor package** #### **Block size:** 64 bytes for all caches #### L1 i-cache and d-cache: 32 KiB, 8-way, Access: 4 cycles #### L2 unified cache: 256 KiB, 8-way, Access: 11 cycles #### L3 unified cache: 8 MiB, 16-way, Access: 30-40 cycles #### Making memory accesses fast! - Cache basics - Principle of locality - Memory hierarchies - Cache organization - Direct-mapped (sets; index + tag) - Associativity (ways) - Replacement policy - Handling writes - Program optimizations that consider caches # **Cache Organization (1)** **Note:** The textbook uses "B" for block size in bytes ❖ Block Size (K): unit of transfer between \$ and Mem L17: Caches II - Given in bytes and always a power of 2 (e.g. 64 B) - Blocks consist of adjacent bytes (differ in address by 1) Spatial locality! 646 yte cache block # **Cache Organization (1)** **Note:** The textbook uses "b" for offset bits - $\bullet$ Block Size (K): unit of transfer between \$ and Mem - Given in bytes and always a power of 2 (e.g. 64 B) - Blocks consist of adjacent bytes (differ in address by 1) - Spatial locality! Offset field 64 6 • Low-order $log_2(K) = k$ bits of address tell you which byte within a block - (address) $mod 2^n = n$ lowest bits of address - (address) modulo (# of bytes in a block) How many bits do I need to specify every byte in a block? m-k bits k bits m-bit address: Block Number Block Offset (refers to byte in memory) # Polling Question [Cache II-a] - \* If we have 6-bit addresses and block size K = 4 B, which block and byte does 0x15 refer to? - Vote at: <a href="http://pollev.com/rea">http://pollev.com/rea</a> E. We're lost... | | <b>Block Num</b> | <b>Block Offset</b> | $0x \frac{1}{x} \frac{5}{x}$ | |----|------------------|---------------------|----------------------------------------------------| | A. | 1 | 1 | address: 0b 0 1 0 1 0 1 offset (value 5) (value 1) | | B. | 1 | 5 | | | C. | 5 | 1 | offset width = logy (K) = logy (4) = 2 bis | | D. | 5 | 5 | Ox15 | block number 5 # **Cache Organization (2)** #### in bytes - ❖ Cache Size (C): amount of data the \$ can store - Cache can only hold so much data (subset of next level) - Given in bytes (C) or number of blocks (C/K) - Example: C = 32 KiB = 512 blocks if using 64 -B blocks - Where should data go in the cache? - We need a mapping from memory addresses to specific locations in the cache to make checking the cache for an address fast - What is a data structure that provides fast lookup? - Hash table! CSE351, Spring 2020 # **Review: Hash Tables for Fast Lookup** Place Data in Cache by Hashing Address # Place Data in Cache by Hashing Address #### **Practice Question** - m - \* 6-bit addresses, block size K = 4 B, and our cache holds S = 4 blocks. $= \frac{C}{K}$ $S = \frac{\log_2(4)}{2}$ bits - A request for address 0x2A results in a cache miss. Which index does this block get loaded into and which 3 other addresses are loaded along with it? - No voting for this question # Place Data in Cache by Hashing Address # Tags Differentiate Blocks in Same Index #### **Checking for a Requested Address** - CPU sends address request for chunk of data - Address and requested data are not the same thing! - Analogy: your friend ≠ their phone number - TIO address breakdown: - Index field tells you where to look in cache - Tag field lets you check that data is the block you want - Offset field selects specified start byte within block - Note: t and s sizes will change based on hash function #### Cache Puzzle [Cache II-b] Vote at <a href="http://pollev.com/rea">http://pollev.com/rea</a> - Based on the following behavior, which of the following block sizes is NOT possible for our cache? - Cache starts empty, also known as a cold cache - Access (addr: hit/miss) stream: hit: block with data already in \$ miss: data not in \$, pulls block containing data - (14: miss), (15: hit), (16: miss) L>3 L>2 14 \$15 are in the same block - A. 4 bytes - B. 8 bytes - C. 16 bytes - D. 32 bytes - E. We're lost...