





























```
Locality Example #1
                                                  M = 3, N=4
int sum_array_rows(int a[M][N])
                                                 a[0][0] a[0][1] a[0][2] a[0][3]
      int i, j, sum = 0;
                                                 a[1][0] a[1][1] a[1][2] a[1][3]
      for (i = 0; i < M; i++) for (j = 0; j < N; j++) sum += a[i][j];
                                                 a[2][0] a[2][1] a[2][2] a[2][3]
                                                  Access Pattern: 1)
      return sum;
                                                                     a[0][1]
a[0][2]
                                                  stride = ?
 Layout in Memory
                                                                 10) a[2][1]
                                                                     a[2][2]
Note: 76 is just one possible starting address of array a
                                                                 12) a[2][3]
```







Cache Performance Metrics

Huge difference between a cache hit and a cache miss

Could be 100x speed difference between accessing cache and main memory (measured in clock cycles)

Miss Rate (MR)

Fraction of memory references not found in cache (misses / accesses) = 1 - Hit Rate

Hit Time (HT)

Time to deliver a block in the cache to the processor Includes time to determine whether the block is in the cache

Miss Penalty (MP)

Additional time required because of a miss

# Cache Performance Two things hurt the performance of a cache: Miss rate and miss penalty Average Memory Access Time (AMAT): average time to access memory considering both hits and misses AMAT = Hit time + Miss rate × Miss penalty (abbreviated AMAT = HT + MR × MP) Missing Penalty Assume HT of 1 clock cycle and MP of 100 clock cycles 77%: AMAT = 97%: AMAT =

Peer Instruction Question

Processor specs: 200 ps clock, MP of 50 clock cycles, MR of 0.02 misses/instruction, and HT of 1 clock cycle

AMAT = HT + MR\*MP =

Which improvement would be best?

A. 190 ps clock

B. Miss penalty of 40 clock cycles

C. MR of 0.015 misses/instruction

V UNIVERSITY of WASHINGTON L16: Caches I CSE381, W

### Can we have more than one cache?

- \* Why would we want to do that?
  - Avoid going to memory!
- Typical performance numbers:
  - Miss Rate
    - L1 MR = 3-10%
    - $\cdot$  L2 MR = Quite small (e.g. < 1%), depending on parameters, etc.
  - Hit Time
    - L1 HT = 4 clock cycles
    - L2 HT = 10 clock cycles
  - Miss Penalty
    - P = 50-200 cycles for missing in L2 & going to main memory
    - · Trend: increasing!

25

NIVERSITY of WASHINGTON L16: Caches I CSE351, Winter 201

### **Memory Hierarchies**

- Some fundamental and enduring properties of hardware and software systems:
  - Faster storage technologies almost always cost more per byte and have lower capacity
  - The gaps between memory technology speeds are widening
  - True for: registers  $\leftrightarrow$  cache, cache  $\leftrightarrow$  DRAM, DRAM  $\leftrightarrow$  disk, etc.
  - Well-written programs tend to exhibit good locality
- These properties complement each other beautifully
  - They suggest an approach for organizing memory and storage systems known as a memory hierarchy

26



## Summary

- Memory Hierarchy
  - Successively higher levels contain "most used" data from lower levels
  - Exploits temporal and spatial locality
  - Caches are intermediate storage levels used to optimize data transfers between any system elements with different characteristics
- Cache Performance
  - Ideal case: found in cache (hit)
  - Bad case: not found in cache (miss), search in next level
  - Average Memory Access Time (AMAT) = HT + MR × MP
    - Hurt by Miss Rate and Miss Penalty

28

# Aside: Units and Prefixes

- Here focusing on large numbers (exponents > 0)
- Note that  $10^3 \approx 2^{10}$
- \* SI prefixes are ambiguous if base 10 or 2
- ❖ IEC prefixes are unambiguously base 2

SIZE PREFIXES (10<sup>x</sup> for Disk, Communication; 2<sup>x</sup> for Memory)

| SI Size          | Prefix | Symbol | IEC Size | Prefix | Symbol |
|------------------|--------|--------|----------|--------|--------|
| $10^{3}$         | Kilo-  | K      | 210      | Kibi-  | Ki     |
| $10^{6}$         | Mega-  | M      | 220      | Mebi-  | Mi     |
| 10 <sup>9</sup>  | Giga-  | G      | 230      | Gibi-  | Gi     |
| 10 <sup>12</sup> | Tera-  | T      | 240      | Tebi-  | Ti     |
| 10 <sup>15</sup> | Peta-  | P      | 250      | Pebi-  | Pi     |
| $10^{18}$        | Exa-   | E      | 260      | Exbi-  | Ei     |
| 1021             | Zetta- | Z      | 270      | Zebi-  | Zi     |
| 1024             | Votta- | v      | 280      | Vohi-  | Vi     |

29