Sorting and Algorithm Bounds
Analyzing the best-possible comparison sorting algorithm, decision tree sort, and runtime bounds for algorithm design.
Kevin Lin, with thanks to many others.
1
Ask questions anonymously on Piazza. Look for the pinned Lecture Questions thread.
Feedback from the Reading Quiz
2
3
Sort
Best-Case
Worst-Case
Space
Stable
Notes
Selection Sort
Θ(N2)
Θ(N2)
Θ(1)
No
Heapsort
Θ(N)
Θ(N log N)
Θ(1)
No
Slow in practice.
Naive Merge Sort
Θ(N log N)
Θ(N log N)
Θ(N)
Yes
Java Merge Sort
Θ(N)
Θ(N log ρ)
Θ(N)
Yes
Fastest stable sort, ρ = number of runs.
Insertion Sort
Θ(N)
Θ(N2)
Θ(1)
Yes
Best for small or almost sorted inputs.
Naive Quicksort
Θ(N log N)
Θ(N2)
Θ(N)
Yes
2x or more slower than merge sort.
Java Quicksort
Θ(N)
Θ(N2)
O(log N)
No
Fastest comparison sort.
On the Worst-Case Complexity of TimSort (Auger et al./Dagstuhl Research Online Publication Server)
4
Median-of-3 Decision Tree
a ⩻ b
b ⩻ c
No
Yes
a ⩻ c
a c b
Yes
c a b
No
a ⩻ c
b ⩻ c
b a c
b c a
c b a
Yes
No
No
Yes
No
Yes
a b c
5
Sorting with Decision Trees
int[] decisionTreeSort(int a, int b, int c) {
if (a < b) {
if (b < c) return {a, b, c};
else if (a < c) return {a, c, b};
else return {c, a, b};
} else {
if (c < b) return {b, a, c};
else if (c < a) return {b, c, a};
else return {c, b, a};
}
}
In the worst case, how many questions would you need to ask to definitively sort {a, b, c, d}?
6
Decision Tree Sorting
In the worst case, how many questions would you need to ask to definitively sort {a, b, c, d}?
The decision tree needs to decide between all possible permutations of the input.
3 items. Count 6 unique permutations.
4 items. Insert the new key d in 4 positions for each of the 6 permutations of 3 items.
N items. N! unique permutations.
How many levels minimum in a binary tree with 24 leaves? log 24 = 4.58, round up to 5.
7
A
a b c
8
A036604: Sorting numbers: minimal number of comparisons needed to sort n elements (Neil Sloane/OEIS); Towards Optimal Sorting of 16 Elements (Marcin Peczarski/arXiv)
Optimal Comparison Sort
Reductions set upper bounds on runtime.
Theoretical optimality sets lower bounds.
On random arrays, decision tree sorting is optimal in the number of comparisons.
Cost model. Number of comparisons.
Optimal decision tree sort doesn’t exist. Provably optimal for N < 16 and N = 22.
Decision Tree Sorting Analysis
9
Upper Bound for N!
Goal. Find an asymptotic complexity bound for the function log(N!).
Subgoal. Find an upper bound for the function N!
10
Q
Upper Bound for N!
Goal. Find an asymptotic complexity bound for the function log(N!).
Subgoal. Find an upper bound for the function N!
11
A
Upper Bound for log(N!)
Goal. Find an asymptotic complexity bound for the function log(N!).
Subgoal. Find an upper bound for the function log(N!)
12
Q
Upper Bound for log(N!)
Goal. Find an asymptotic complexity bound for the function log(N!).
Subgoal. Find an upper bound for the function log(N!)
13
A
Demo
Lower Bound for log(N!)
Goal. Find an asymptotic complexity bound for the function log(N!).
Subgoal. Find a lower bound for the function log(N!)
14
Q
Lower Bound for log(N!)
Goal. Find an asymptotic complexity bound for the function log(N!).
Subgoal. Find a lower bound for the function log(N!)
15
A
What can we say about decision tree sorting?
16
Sorting Algorithm Bound
Decision tree sorting is in Θ(N log N) on random arrays.
Lower bound on worst case. Any possible comparison sort on a random array must be in Ω(N log N).
17
Demo
A
18
Sort
Best-Case
Worst-Case
Space
Stable
Notes
Selection Sort
Θ(N2)
Θ(N2)
Θ(1)
No
Heapsort
Θ(N)
Θ(N log N)
Θ(1)
No
Slow in practice.
Naive Merge Sort
Θ(N log N)
Θ(N log N)
Θ(N)
Yes
Java Merge Sort
Θ(N)
Θ(N log ρ)
Θ(N)
Yes
Fastest stable sort, ρ = number of runs.
Insertion Sort
Θ(N)
Θ(N2)
Θ(1)
Yes
Best for small or almost sorted inputs.
Naive Quicksort
Θ(N log N)
Θ(N2)
Θ(N)
Yes
2x or more slower than merge sort.
Java Quicksort
Θ(N)
Θ(N2)
O(log N)
No
Fastest comparison sort.
Optimal Sort
Θ(N)
Θ(N log N)
Θ(1)
Yes
Doesn’t exist (yet).
On the Worst-Case Complexity of TimSort (Auger et al./Dagstuhl Research Online Publication Server)
Algorithm Design Tips
19
Algorithm Design Paradigms
Greedy Algorithms. Consider each option in order of lowest-cost.
Prim’s Algorithm.
Kruskal’s Algorithm.
Dijkstra’s Algorithm.
Caveat. Can lead to suboptimal solutions.
Dijkstra’s algorithm on negative edge weighted graphs.
Divide-and-Conquer Algorithms. Solve two or more subproblems recursively, and then combine the results.
Merge sort.
Quicksort.
Prototypical usage. Turn brute-force N2 runtime algorithm into N log N algorithm.
20
Algorithms (Robert Sedgewick, Kevin Wayne/Princeton)
Algorithm Design Process
Hypothesize. How do invariants affect the behavior for each operation?
Identify. What strategies have we used before? What examples can we apply?
Plan. Propose a new way from findings.
Analyze. Does the plan do the job? What are potential problems with the plan?
Create. Implement the plan.
Evaluate. Check implemented plan.
Find a lower and upper bound. Define a slow but totally correct solution. Build a mental model: identify key properties.
Consider each algorithm that you know. Which ones might work? How do the existing algorithms break down?
Apply an algorithm design idea. Perform a reduction: transform input and output. Or modify the data structures used.
Use an algorithm design paradigm.
21
Counting Inversions
Given a permutation of length N, count the number of inversions.
3 inversions: 2–1, 3–1, 7–6
Lower bound? Upper bound? Desired runtime? Algorithm paradigm?
22
Q
Algorithms (Robert Sedgewick, Kevin Wayne/Princeton)
0
2
3
1
4
5
7
6
Counting Inversions: Your Ideas
Given a permutation of length N, count the number of inversions.
3 inversions: 2–1, 3–1, 7–6
23
A
Algorithms (Robert Sedgewick, Kevin Wayne/Princeton)
0
2
3
1
4
5
7
6
Optical Character Recognition
Suppose we’re building an optical character recognition system.
We want to separate lines of text. There is some white space between the lines but problems like noise and the tilt of the page makes it hard to find.
How can we do line segmentation?
24
Q
The Algorithm Design Manual (Steven Skiena/Springer)
Optical Character Recognition: Your Ideas
How can we do line segmentation?
25
A
The Algorithm Design Manual (Steven Skiena/Springer)