Iterative Algorithm Analysis
Table of contents
In the previous lecture, we learned the fundamental principles and a process for analyzing the runtime of a program. In this reading, we will apply the runtime analysis procedure to analyze a sorting algorithm called selection sort. The goal of a sorting algorithm is to rearrange an array of N items into ascending (sorted) order.
Selection sort
- Selection Sort
- Repeatedly select the smallest remaining item and swap it to its proper index.
- Find the smallest item in the array, and swap it with the first item.
- Find the next smallest item in the array, and swap it with the next item.
- Continue until all items in the array are sorted.
In this example, the array begins unsorted with the items 6 3 7 2 8 1
. To find the smallest item in the array, selection sort scans across the entire array and swaps the value 1
with the first item in the array. To find the next smallest item in the array, selection sort scans across all but the first item in the array (since we know it’s the smallest) and swaps the value 2
with the next item in the array.
Selection sort does not have different runtime cases.
Why doesn't selection sort have different best or worst case runtime analyses?
There are no shortcuts to finding the smallest item. We always need to check all of the unsorted items in the array to be sure that we’ve found the smallest item.
Let’s use the simplified modeling process to analyze selection sort.
- Choose our cost model (representative operation).
- Figure out the order of growth for the count of the cost model by either:
- Making an exact count and then discarding the unnecessary pieces.
- Or, using intuition/inspection to determine order of growth. (Needs practice!)
First, we need to choose a cost model.
Why is the number of "scans to find the next-smallest element" hard to use as a cost model?
The time it takes to find the next-smallest item depends on how many items are left! We can count the number of scans, but each scan does a different amount of work. We end-up needing to count the number of comparisons and swaps anyways to determine how much time selection sort will take to execute.
We can use the number of comparisons and swaps as our cost model, so we need to figure out the order of growth for comparisons and for swaps. Let’s consider comparisons first.
How many comparisons does it take to find the (first) smallest item in the array in terms of N, the number of items in the array?
It takes N - 1 comparisons to find the smallest item. We can keep track of the smallest item we’ve seen so far in the array and iterate across, comparing each item against the smallest seen so far, updating it as necessary. Initially, the smallest item seen so far is the first item, so we don’t need to compare it against itself.
By the same logic, it takes N - 2 comparisons to find the next smallest item.
The pattern continues until all N items in the array are sorted.
What is the order of growth for the number of comparisons in selection sort?
The exact count is N(N - 1) / 2, following the same logic as our analysis for dup1
. The green-shaded cells from the example forms a right triangle representing roughly the number of comparisons needed. We know the area of a right triangle with side length N - 1 is roughly N2 / 2.
Try following the same steps and analyze the number of swaps on your own.
What is the overall order of growth of the runtime for selection sort?
The number of comparisons has an order of growth of N2 while the number of swaps has an order of growth of just N.
Therefore, the overall order of growth of the runtime for selection sort is in Theta(N2).
Print party
Consider the following example. (You can visualize it in the Online Java Tutor.)
void printParty(int N) {
for (int i = 1; i <= N; i *= 2) {
for (int j = 0; j < i; j += 1) {
System.out.println("hello");
}
}
}
The outer loop updates the value of i
by doubling on each iteration. The inner loop increments from 0 to i
on each iteration. We can choose the print statement as the cost model, C(N).
Let’s continue to use visualizations to analyze printParty
as well. For each value of i
in the grid, we plot the number of iterations for j
to reach that value of i
. At the bottom, for each N, we compute C(N) by summing up all of the shaded cells in the rows of i
less than or equal to the value of N.
Since the outer loop updates the value of i
by doubling on each iteration, there are holes in the grid because i
will never be set to 3, 5, 6, 7, 9, 10, 11, …
If N is a power of 2, the number of print statements is given by C(N) = 1 + 2 + 4 + … + N. Try to find an upper and lower bound for the function.
What is the overall order of growth for C(N)?
C(N) is bounded by the family of linear-growth functions! We can bound C(N) above by 4N and below by 0.5N.
Runtime analysis requires careful thought. But there are a few useful strategies.
- Find the exact count of steps.
- Write out examples.
- Use a geometric argument with visualizations!
The following two summations frequently occur in runtime analysis.
- 1 + 2 + 3 + 4 + … + (Q - 1) + Q = Q(Q + 1) / 2, which is in Theta(Q2).
- 1 + 2 + 4 + 8 + … + (Q / 2) + Q = 2Q - 1 for Q a power of 2, which is in Theta(Q).