Radix Sorts

Erika Wolfe, with thanks to many others.
1
Radix Sorts
Counting sort fundamentals, and as a subroutine to LSD and MSD radix sorts.
Ask questions anonymously on Piazza. Look for the pinned Lecture Questions thread.

How to handle non-numeric keys like {♣, ♠, ♥, ♦}?
Convert the keys to numeric values, the exact implementation can vary.
Imagine ♣ → 0, ♠ → 1, ♥ → 2, ♦→ 3

We’ll answer these in lecture today: 
Runtime of counting sort?
What is “radix”?
Can we see another counting sort demo?
The second example sorting by card suit was difficult!
Feedback from the Reading Quiz
2

Defined: A type of sorting algorithm that only reads elements through a comparison operation that determines which of two elements should occur first when sorted.
	Or, more simply, sorting using only compareTo() type operations

We determined the best we can do with comparison-based sorting is Θ(N log N) time complexity.

Can we do better?  What if we don’t compare at all? 
Comparison Based Sorting
3

Radix: number of different digits or characters in a given alphabet.

Radix Defined
Name
Radix
Characters
Binary
2
01
Decimal
10
0123456789
Lowercase Latin Alphabet
26
abcdefghijklmnopqrstuvwxyz
ASCII
128
http://www.asciitable.com/
UNICODE16
65536
https://en.wikipedia.org/wiki/List_of_Unicode_characters
4

We want counting sort to work for non-unique and/or non-consecutive keys!
Count the number of occurrences for each key option.
Compute the starting indices for each key option from the counts array.
Move through the original items in order.  For each [item, key] do:
Get the correct index for result by checking the index array for key
Copy item into the result using this index
Increment the index array for key
Copy items back to initial array (if needed)

Generalizing Counting Sort
Demo
5
This is in contrast to the initial naive counting sort introduced in the reading, which assumed the keys were unique and included all values 0 - N-1.

Visualization of Counting Sort
From Algorithms, 4th edition by Sedgewick and Wayne
6


Counting Sort Runtime Analysis
N = number of items, R = radix (size of alphabet)
Θ(N) Count the number of occurrences for each key option.
Θ(R) Compute the starting indices for each key option from the counts array.
Θ(N) Move through the original items in order.  For each [item, key] do:
Θ(1) Get the correct index for result by checking the index array for key
Θ(1) Copy item into the result using this index
Θ(1) Increment the index array for key
Θ(N) Copy items back to initial array (if needed)
Overall: Θ(N + R)
A
8

Counting Sort Memory Use Analysis
Counting Sort
Index
Suit
TA
0
♠
Lucy
1
♣
Thanika
2
♦
Louis
3
♣
Velocity
4
♥
Brian
5
♥
Elena
6
♠
Howard
Suit
Count
♣
2
♠
2
♥
2
♦
1
Suit
Index
♣
2
♠
4
♥
6
♦
7
Index
Suit
TA
0
♣
Thanika
1
♣
Velocity
2
♠
Lucy
3
♠
Howard
4
♥
Brian
5
♥
Elena
6
♦
Louis
Counts
Starting indices
Sorted
Θ(R(counts) + R(indices) + N(results))
Overall: Θ(N + R)
9
We can actually use the same array for counts and starting indices, but the implementation details are not a focus for us!

Runtime and memory use of Θ(N + R)!         N = # of items, R = radix of alphabet
We are able to beat comparison sort by avoiding binary compares.
If N >= R, we expect reasonable performance.  If N is much bigger than R, then R can become negligible.
Empirical experiments are needed to compare to Quicksort on practical inputs.
Input is restricted to alphabetic (finite radix) keys → we can’t sort items with non-alphabetic keys, like Strings! 
Counting Sort
🙃
10
This distinction between alphabetic and non-alphabetic keys is confusing but important.  I think that examples are the easiest way to clarify.  With keys that are decimal integers within 0-5, we know the alphabet is {0, 1, 2, 3, 4, 5}.  If the keys are Strings, what would be our alphabet, the list of all possible Strings?  This is not only a large list, it’s an infinite one!  Counting sort requires that our alphabet is finite.


Counting sort (as described in demo)
Quicksort
Counting sort requires building an array of the largest population size (~30,000,000) to calculate the number of cities with each population possibility.
Counts array is a super large, very sparse array - most indices are unused because we are sorting only 100 cities.

For sorting an array of the 100 largest cities by population, which sort has a better expected worst case runtime in seconds?  Why?
12
A

We want to be able to sort keys that do not belong to a finite alphabet (such as Strings).
Strings consist of characters from a finite alphabet (i.e. “it’s friday”)
Numbers do too! (i.e. 525600)
Much more flexible!
→ Idea: Sort each digit (index) independently using counting sort.

Radix sort: a sort that works one character at a time by grouping items with the same digit in the same position.
Radix Sort in General
13
Digit and character commonly used interchangeably in this context

Idea: Sort each digit independently, from rightmost to leftmost
Example - Alphabet: {1, 2, 3}
Least Significant Digit (LSD) Radix Sort
Index
Key
Name
0
22
Stitch
1
12
Gantu
2
31
Nani
3
23
Lilo
4
11
David
Index
Key
Name
0
31
Nani
1
11
David
2
22
Stitch
3
12
Gantu
4
23
Lilo
Index
Key
Name
0
11
David
1
12
Gantu
2
22
Stitch
3
23
Lilo
4
31
Nani
14

LSD Radix Sort
Why is it important for the correctness of LSD radix sort that counting sort is stable?  Give an example of what could go wrong if it were not stable.
Q
Index
Key
Name
0
22
Stitch
1
12
Gantu
2
31
Nani
3
23
Lilo
4
11
David
Index
Key
Name
0
31
Nani
1
11
David
2
22
Stitch
3
12
Gantu
4
23
Lilo
Index
Key
Name
0
11
David
1
12
Gantu
2
22
Stitch
3
23
Lilo
4
31
Nani
15

LSD Radix Sort

LSD sort only works if counting sort is stable because otherwise relationships revealed by intermediate sorts are lost!
16
A
Index
Key
Name
0
22
Stitch
1
12
Gantu
2
31
Nani
3
23
Lilo
4
11
David
Index
Key
Name
0
31
Nani
1
11
David
2
22
Stitch
3
12
Gantu
4
23
Lilo
Index
Key
Name
0
11
David
1
12
Gantu
2
23
Lilo
3
22
Stitch
4
31
Nani
In the intermediate step, Stitch was correctly placed before Lilo because Stitch’s key has 2 at the rightmost digit and Lilo has 3.  This relative information must be preserved, which is why LSD sort relies on counting sort being stable!

What is the intuition for why LSD radix sort works?
If the digits not yet examined are different, the digits already examined don’t matter! 
→ later pass will sort correctly on more significant digits.
If the digits not yet examined are identical, the keys are already properly ordered.
→ because the sort is stable, they will remain so
I’m still not convinced...
C0
C1
C2
C3
C4
C0
C1
C2
C3
C4
C0
C1
C2
C3
C4
Examined
Not yet examined
32670
32800
11999
17
Both visualizations explain the same idea, the bottom is just a concrete example!


LSD Radix Sort Runtime

A
N = # of items, R = radix of alphabet, W = width of each item in # digits

We have to run counting sort for each digit in the width.
Counting sort has runtime on the order of Θ(N + R).

→ LSD Radix Sort runtime: Θ(WN + WR)
19

Non-equal key lengths
Example - Alphabet: {1, 2, 3}
Index
Key
Value
0
3
is
1
31
fun!
2
23
duper
3
12
super
4
1
sorting
Index
Key
Value
0
31
fun!
1
1
sorting
2
12
super
3
3
is
4
23
duper
🤔
20

Non-equal key lengths
Example - Alphabet: {1, 2, 3}
→ Treat empty spaces as less than all other characters
Index
Key
Value
0
·3
is
1
31
fun!
2
23
duper
3
12
super
4
·1
sorting
Index
Key
Value
0
31
fun!
1
·1
sorting
2
12
super
3
·3
is
4
23
duper
Index
Key
Value
0
·1
sorting
1
·3
is
2
12
super
3
23
duper
4
31
fun!
21

Use counting sort on each index, right to left.  Now we can sort non-alphabetic keys that consist of alphabetic keys!
Runtime: Θ(WN + WR), Memory use: Θ(N + R)              N = # of items, R = radix, W = width
If R is very small compared to N and W we can think of it as negligible.

It’s annoying that the runtime depends on the length of the longest key → 
LSD Radix Sort Summary
🤔
22
The runtime depends on the length of the longest key because we pad every shorter key to have length equal to the longest key.

Idea: similar to LSD, but sort leftmost to rightmost
Motivation: Consider sorting very large numbers that are very different.
                  99999999999999999
                        72638283948
                   3222938273827837

By definition, LSD radix sort examines the least significant digit first!

→ May do more computation than necessary

Most Significant Digit (MSD) Radix Sort
23
Looking at these numbers, we can very easily tell that the first is the largest and the second is the smallest.  With LSD sort, the most significant digits are (by definition) considered last.  If the leftmost digits are the most significant, why don’t we try considering them first?

Suppose we sort each digit index, left to right.  Will we arrive at the correct result?  Why? 
MSD Radix Sort
a
d
d
c
a
b
f
a
d
f
e
e
b
a
d
f
e
d
b
e
d
a
c
e
a
d
d
a
c
e
b
a
d
b
e
d
c
a
b
f
a
d
f
e
e
f
e
d
Q
24

No!  Items that were previously ordered by most significant digit will be swapped!
MSD Radix Sort
a
d
d
c
a
b
f
a
d
f
e
e
b
a
d
f
e
d
b
e
d
a
c
e
a
d
d
a
c
e
b
a
d
b
e
d
c
a
b
f
a
d
f
e
e
f
e
d
b
a
d
c
a
b
f
a
d
a
c
e
a
d
d
b
e
d
f
e
e
f
e
d
A
25
No!  We won’t arrive at the correct result because items that were previously ordered correctly based on the most significant digit are later swapped when sorted on a less significant digit.

Idea: Sort each subproblem separately.
MSD Radix Sort
a
d
d
c
a
b
f
a
d
f
e
e
b
a
d
f
e
d
b
e
d
a
c
e
a
d
d
a
c
e
f
e
e
f
e
d
b
a
d
b
e
d
f
a
d
f
e
e
f
e
d
c
a
b
a
c
e
a
d
d
b
a
d
b
e
d
f
a
d
f
e
e
f
e
d
26

What is the best case runtime of MSD sort?  (in terms of N, W, R)? 
What type of input leads to this best case?

What is the worst case runtime of MSD sort?  (in terms of N, W, R)?
What type of input leads to this worst case?

N = # of items, R = radix, W = width
Q
27
MSD Radix Sort Runtime

Best case:
	We finish in one counting sort pass, looking only at the top digit: Θ(N + R)
	Every input has a unique most significant digit
Worst case:
	We have to look at every character, degenerating to LSD sort: Θ(WN + WR)
	Every input is exactly the same, or only differs on the least significant digit.

N = # of items, R = radix, W = width
A
28
MSD Radix Sort Runtime

Runtime - Best case:  Θ(N + R), Worst case: Θ(WN + WR)
Memory usage - Θ(N + WR)

Think about the runtime of MSD radix sort by considering the number of characters that must be examined.

Long strings are rarely random in practice → may need specialized algorithms

Analysis of MSD Radix Sort
From Algorithms, 4th edition by Sedgewick and Wayne
29
Memory usage - worst case we have W recursive levels and have to make an R-size count array at each.

Long strings often have some sort of structure.  For example, ID numbers often are some combination of state code (“WA”), last name, first name, etc...

In practice...
Optimizations like caching, compiler optimizations, and others are not captured by our algorithmic analyses!
Just-In-Time Compiler: Java interpreter observes your code while it’s running, studies the repeated operations, and re-implements your code based on what it learns! 
→ Empirical analysis is needed in most cases to compare sorting algorithms
Literally timing sorting inputs of various size and distribution and deciding what is best for your application
Small changes to optimize for your specific application can make a huge difference!


30
Specifics of caching, compiler optimizations, etc are out of scope for this class.

After some practice, you should be able to:
Manually perform counting, LSD, and MSD sort
Explain why it’s important for LSD and MSD that counting sort is stable
Discuss the runtime and memory required for counting, LSD, and MSD sort
Summary
31
Algorithm
Runtime
Memory use
Counting Sort
(N + R)
(N + R)
LSD Radix Sort
Θ(WN + WR)
Θ(N + R)
MSD Radix Sort
Best:  Θ(N + R), Worst: Θ(WN + WR)
Θ(N + WR)
MSD sort memory use is less important than counting and LSD.

Why study sorts?
→ Sorting is a great study in algorithm iteration.  We’ve seen so many different ways to approach the same problem!
Deciding between sorts is really tricky!  → Empirical analysis
(Extra) Sounds of sorting algorithms: LSD and MSD
How many characters are in the alphabet used for the LSD sort problem?
How many digits are in the keys used for the LSD sort problem?
Summary continued
32