CSE 326: Data Structures
Sorting in (kind of) linear time
|
|
|
Zasha Weinberg in lieu of Steve Wolfman |
|
Winter Quarter 2000 |
BinSort (a.k.a.
BucketSort)
|
|
|
If all keys are 1…K |
|
Have array of size K |
|
Put keys into correct bin (cell) of
array |
BinSort example
|
|
|
K=5.
list=(5,1,3,4,3,2,1,1,5,4,5) |
BinSort Pseudocode
BinSort Running time
BinSort Conclusion:
|
|
|
|
K is a constant |
|
BinSort is linear time |
|
K is variable |
|
Not simply linear time |
|
K is large (e.g. 232) |
|
Impractical |
BinSort is “stable”
|
|
|
|
Stable Sorting algorithm. |
|
Items in input with the same key end up
in the same order as when they began. |
|
Important if keys have associated
values |
|
Critical for RadixSort |
RadixSort
|
|
|
Radix = “The base of a number system”
(Webster’s dictionary) |
|
History: used in 1890 U.S. census by
Hollerith* |
|
Idea: BinSort on each digit, bottom up. |
RadixSort – magic! It works.
|
|
|
Input list:
126, 328, 636, 341, 416, 131, 328 |
|
BinSort on lower digit:
341, 131,
126, 636, 416, 328, 328 |
|
BinSort result on next-higher
digit:
416, 126, 328, 328, 131, 636, 341 |
|
BinSort that result on highest
digit:
126, 131, 328, 328, 341, 416, 636 |
Not magic. It provably works.
|
|
|
|
Keys |
|
N-digit numbers |
|
base B |
|
Claim: after ith BinSort,
least significant i digits are sorted. |
|
e.g. B=10, i=3, keys are 1776 and
8234. 8234 comes before 1776 for last
3 digits. |
Induction to the
rescue!!!
|
|
|
|
base case: |
|
i=0.
0 digits are sorted (that wasn’t hard!) |
Induction is rescuing us…
|
|
|
|
|
Induction step |
|
assume for i, prove for i+1. |
|
consider two numbers: X, Y. Say Xi is ith digit
of X (from the right) |
|
Xi+1 < Yi+1
then i+1th BinSort will put them in order |
|
Xi+1 > Yi+1
, same thing |
|
Xi+1 = Yi+1
, order depends on last i digits.
Induction hypothesis says already sorted for these digits. (Careful about ensuring that your BinSort
preserves order aka “stable”…) |
Paleontology fact
|
|
|
Early humans had to survive without
induction. |
Running time of Radixsort
|
|
|
|
How many passes? |
|
How much work per pass? |
|
Total time? |
|
Conclusion |
|
Not truly linear if K is large. |
|
In practice |
|
RadixSort only good for large number of
items, relatively small keys |
|
Hard on the cache, vs.
MergeSort/QuickSort |
What data types can you
RadixSort?
|
|
|
|
Any type T that can be BinSorted |
|
Any type T that can be broken into
parts A and B, |
|
You can reconstruct T from A and B |
|
A can be RadixSorted |
|
B can be RadixSorted |
|
A is always more significant than B, in
ordering |
Example:
|
|
|
|
1-digit numbers can be BinSorted |
|
2 to 5-digit numbers can be BinSorted
without using too much memory |
|
6-digit numbers, broken up into A=first
3 digits, B=last 3 digits. |
|
A and B can reconstruct original
6-digits |
|
A and B each RadixSortable as above |
|
A more significant than B |
RadixSorting Strings
|
|
|
1 Character can be BinSorted |
|
Break strings into characters |
|
Need to know length of biggest string
(or calculate this on the fly). |
RadixSorting Strings
example
RadixSorting Strings
running time
|
|
|
N is number of strings |
|
L is length of longest string |
|
RadixSort takes O(N*L) |
RadixSorting IEEE
floats/doubles
|
|
|
You can RadixSort real numbers, in most
representations |
|
We do IEEE floats/doubles, which are
used in C/C++. |
|
Some people say you can’t RadixSort
reals. In practice (like IEEE reals)
you can. |
Anatomy of a real number
IEEE floats in binary*
|
|
|
|
Sign: 1 bit |
|
Significand: always 1.fraction. fraction uses 23 bits |
|
Biased exponent: 8 bits. |
|
Bias: represent –127 to +127 by adding
127 (so range is 0-254) |
Observations
|
|
|
significand always starts with 1
à only one way to represent any number |
|
Exponent always more significant than
significand |
|
Sign is most significant, but in a
weird way |
Pseudocode