CSE 326 Data Structures
The Memory Hierarchy Investigates Sorting

Zasha Weinberg, in lieu of Steve Wolfman*

Winter 2000

A victim of the Memory Hierarchy, headed by the mysterious entity known only as “CPU”

The Memory Hierarchy Exploits Locality of Reference

Idea: small amount of fast memory

Keep frequently used data in the fast memory

LRU replacement policy

Keep recently used data in cache

To free space, remove Least Recently Used data

So what?

Optimizing use of cache can make programs way faster

One TA made RadixSort 2x faster, rewriting to use cache better

Not just for sorting

Cache Details (simplified)

Selection Sort – Sucky Sort

Selection Sort Cache Misses

Cache miss à read line à get hits for rest of cache line

Then another miss

# misses = (N²/2)/(cache line size)

QuickSort Bows to Memory Hierarchy

Partition kind of like Selection Sort

BUT, subproblems more quickly fit in cache

Selection Sort only fits in cache right at the end

Iterative MergeSort – not so good

Iterative MergeSort – cont’d

Tiled MergeSort – better

Tiled MergeSort – cont’d

Radix Sort – Very Naughty

On each BinSort

Sweep through input list – cache misses along the way (sucky!)

Append to output list – indexed by pseudorandom digit (ouch!)

Truly evil for large Radix (e.g. 2¹⁶), which reduces # of passes

Not enough RAM – External Sorting

e.g. Sort 10 billion numbers with 1 MB of RAM.

Databases need to be very good at this

Winter 2000 326’ers won’t need to be


	Idea: small amount of fast memory
	Keep frequently used data in the fast memory
	LRU replacement policy
		Keep recently used data in cache
		To free space, remove Least Recently Used data


	Optimizing use of cache can make programs way faster
	One TA made RadixSort 2x faster, rewriting to use cache better
	Not just for sorting


	Cache miss à read line à get hits for rest of cache line
	Then another miss
	# misses = (N²/2)/(cache line size)


	Partition kind of like Selection Sort
	BUT, subproblems more quickly fit in cache
	Selection Sort only fits in cache right at the end


On each BinSort
	Sweep through input list – cache misses along the way (sucky!)
	Append to output list – indexed by pseudorandom digit (ouch!)
		Truly evil for large Radix (e.g. 2¹⁶), which reduces # of passes


	e.g. Sort 10 billion numbers with 1 MB of RAM.
	Databases need to be very good at this
	Winter 2000 326’ers won’t need to be