CSE 326: Data Structures
Sorting in (kind of) linear time
Zasha Weinberg in lieu of Steve Wolfman
Winter Quarter 2000

BinSort (a.k.a. BucketSort)
If all keys are 1…K
Have array of size K
Put keys into correct bin (cell) of array

BinSort example
K=5.  list=(5,1,3,4,3,2,1,1,5,4,5)

BinSort Pseudocode

BinSort Running time

BinSort Conclusion:
K is a constant
BinSort is linear time
K is variable
Not simply linear time
K is large (e.g. 232)

BinSort is “stable”
Stable Sorting algorithm.
Items in input with the same key end up in the same order as when they began.
Important if keys have associated values
Critical for RadixSort

Radix = “The base of a number system” (Webster’s dictionary)
History: used in 1890 U.S. census by Hollerith*
Idea: BinSort on each digit, bottom up.

RadixSort – magic!  It works.
Input list:
126, 328, 636, 341, 416, 131, 328
BinSort on lower digit:
341, 131, 126, 636, 416, 328, 328
BinSort result on next-higher digit:
416, 126, 328, 328, 131, 636, 341
BinSort that result on highest digit:
126, 131, 328, 328, 341, 416, 636

Not magic.  It provably works.
N-digit numbers
base B
Claim: after ith BinSort, least significant i digits are sorted.
e.g. B=10, i=3, keys are 1776 and 8234.  8234 comes before 1776 for last 3 digits.

Induction to the rescue!!!
base case:
i=0.  0 digits are sorted (that wasn’t hard!)

Induction is rescuing us…
Induction step
assume for i, prove for i+1.
consider two numbers: X, Y.  Say Xi is ith digit of X (from the right)
Xi+1 < Yi+1 then i+1th BinSort will put them in order
Xi+1 > Yi+1 , same thing
Xi+1 = Yi+1 , order depends on last i digits.  Induction hypothesis says already sorted for these digits.  (Careful about ensuring that your BinSort preserves order aka “stable”…)

Paleontology fact
Early humans had to survive without induction.

Running time of Radixsort
How many passes?
How much work per pass?
Total time?
Not truly linear if K is large.
In practice
RadixSort only good for large number of items, relatively small keys
Hard on the cache, vs. MergeSort/QuickSort

What data types can you RadixSort?
Any type T that can be BinSorted
Any type T that can be broken into parts A and B,
You can reconstruct T from  A and B
A can be RadixSorted
B can be RadixSorted
A is always more significant than B, in ordering

1-digit numbers can be BinSorted
2 to 5-digit numbers can be BinSorted without using too much memory
6-digit numbers, broken up into A=first 3 digits, B=last 3 digits.
A and B can reconstruct original 6-digits
A and B each RadixSortable as above
A more significant than B

RadixSorting Strings
1 Character can be BinSorted
Break strings into characters
Need to know length of biggest string (or calculate this on the fly).

RadixSorting Strings example

RadixSorting Strings running time
N is number of strings
L is length of longest string
RadixSort takes O(N*L)

RadixSorting IEEE floats/doubles
You can RadixSort real numbers, in most representations
We do IEEE floats/doubles, which are used in C/C++.
Some people say you can’t RadixSort reals.  In practice (like IEEE reals) you can.

Anatomy of a real number

IEEE floats in binary*
Sign: 1 bit
Significand: always 1.fraction.  fraction uses 23 bits
Biased exponent: 8 bits.
Bias: represent –127 to +127 by adding 127 (so range is 0-254)

significand always starts with 1
à only one way to represent any number
Exponent always more significant than significand
Sign is most significant, but in a weird way
