Sorting Algorithms

Asymptotic analysis, iterative sorts, and merge sort.

  1. Asymptotic Analysis
    1. Model the number of steps
    2. Formalize the runtime model
    3. Asymptotic notation
  2. Iterative Sorts
    1. Sorting definitions
    2. Selection sort
    3. Insertion sort
  3. Merge Sort
    1. Recurrences
    2. Recursive merge sort
    3. Recurrence diagrams

Asymptotic Analysis

  1. Compare and contrast runtime analysis, asymptotic analysis, and case analysis.
  2. Analyze the order of growth of a function as constant, linear, or quadratic.
  3. Identify big-theta asymptotic notation for the order of growth of a function.

In your previous programming experience, we focused on correctness and maintainability—code that produces the expected output given a particular input, and code that is easy to improve or extend in the future. But, as we’ll see throughout this course, it’s also important that we design programs with a focus on efficiency too.

Experimental runtime analysis is a method for empirically measuring how long a program takes to run by recording exactly how long it takes to run on different inputs. Experimental analysis is particularly helpful for identifying and addressing performance issues by indicating exactly which operations are contributing to high runtime. This data can be used to create graphs that plot the runtime as the input to the program grows larger and larger.

But what if we want to predict the runtime of a program before writing any code? What other methods can we use to help us design the specification to a program so that it meets our performance needs?

Asymptotic runtime analysis is the process of predicting the amount of time an algorithm will take to run on large inputs. Asymptotic analysis focuses on analyzing algorithms, the concepts underlying how programs work. Compared to experimental analysis, which focuses on empirical measurement, asymptotic analysis focuses on reasoning and logic to analyze the runtime of an algorithm.

Computer scientists specifically focus on large inputs (as the size of the input approaches infinity, or its asymptote) rather than small inputs because larger inputs more clearly demonstrate differences in efficiency. For example, the performance issue in ArrayListDeque only appears when a large number of web pages have been stored in browser history; ArrayListDeque is quite fast when only a few web pages have been stored. The differences in runtime become more appreciable when we’re working with a very large number of elements.

The goal of asymptotic runtime analysis is to produce a (1) predictive and (2) easy-to-compare description of the running time of an algorithm. Consider the indexOf method, which returns the index of a target number in an int[] A or -1 if target is not in the array.

static int indexOf(int[] A, int target) {
    for (int i = 0; i < A.length; i += 1) {
        if (A[i] == target) {
            return i;
        }
    }
    return -1;
}

The behavior of the indexOf algorithm depends on the input values.

  1. Initialize an integer index counter, i.
  2. Loop over the length of the array. On each iteration, if the A[i] is the same as the given target, return the current index. Otherwise, increment i.
  3. If no elements in the array are the same as the given target, return -1.

Model the number of steps

How many steps (or operations) are needed to run the algorithm? Operations include assignments, boolean evaluations, arithmetic operations, method calls, etc.

Let’s take a look at these two assignment statements to illustrate how to count steps.

int a = 4 * 6;
int b = 9 * (a - 24) / (8 - 7);
  1. The first assignment to a takes one step (1) to compute 4 * 6 and another step (2) to assign the result to the variable a.
  2. The second assignment to b takes one step (1) to compute a - 24, another step (2) to compute 9 * (a - 24), another step (3) to compute 8 - 7, another step (4) to compute 9 * (a - 24) / (8 - 7), and a final step (5) to assign the result to the variable b.

In total, these two lines of code take 7 steps for the computer to run. The first goal of runtime analysis was to produce a predictive description of the runtime of an algorithm, but in this example, there’s not much to predict: the two assignment statements always take precisely 7 steps.

indexOf is different: it has a loop and an if statement that depends on the values in the A compared to the target. How do we model the number of steps it takes to run an algorithm in general? Here are two factors that can affect the number of steps to compute indexOf.

  1. The length of the A.
  2. The values of each int in the A compared to the value of the target.

The runtime of many algorithms is greatly affected by the size of the input. A small array typically takes little time to run regardless of whether the algorithm is fast or slow. Where we really start to appreciate the differences in runtime is when the array is very large. Therefore, when computer scientists talk about runtime analysis, the default assumption is asymptotic analysis.

Asymptotic analysis is a way of evaluating the efficiency of an algorithm on large inputs. The choice for the “large input” is called the asymptotic variable. For indexOf, the length of the A can be the asymptotic variable.

In this course, we’ll usually tell you the asymptotic variable using the phrases, “with respect to” or “in terms of”. Asymptotic analysis is defined in terms of the asymptotic variable, so make sure to confirm the asymptotic variable before analyzing a program.

But we also know that the runtime depends on other factors such as the relationship between the target and the numbers in the A. Case analysis is a way to account for variation in the model based on interactions between other factors besides the asymptotic variable.

Best case indexOf
The best case (most efficient or fastest) occurs when the target is the same as the A[0]—even when the array is very large.
Worst case indexOf
The worst case (least efficient or slowest) occurs when the target is the same as the last element in the A or not even in the A at all.

All other factors not covered by the asymptotic variable are considered in case analysis. In this course, we’ll focus only on the best and worst case: take all the remaining factors that you have at your disposal and choose a situation that produces the fastest (best case) or the slowest (worst case) runtime.

We now have a predictive model for the number of steps (a proxy for the time) it takes to run an algorithm. But this model is not easy to compare. To communicate our model, we’ll introduce some new vocabulary used to express asymptotic analysis.

Formalize the runtime model

How do we express our model to others so that we can communicate results?

The order of growth relates the size of the input to the time that it takes to run the algorithm on that input. Often, we’ll use a mathematical variable like N to refer to the asymptotic variable.

Best case indexOf orders of growth
In the best case, the order of growth for the runtime of indexOf is constant.
Worst case indexOf orders of growth
In the worst case, the order of growth for the runtime of indexOf is linear with respect to N, the length of A.

When the order of growth is constant, we don’t specify “with respect to N” because constant order of growth implies that the runtime does not scale with N.

Orders of growth help us compare runtime in terms of words like constant, linear, and quadratic. In practice, though, you’ll rarely see computer scientists writing out the full orders of growth sentence. Instead, computer scientists communicate orders of growth using asymptotic notation.

Asymptotic notation

Asymptotic notation provides precise, mathematical shorthand for orders of growth.

Consider this graph that depicts three functions of N.

What do the x-axis and the y-axis represent in the graph?

The x-axis represents the size of the input, N. The y-axis represents the model’s prediction for the number of steps that the algorithm will require for an input of a given size.

The three functions depicted in the graph are:

  1. The top blue function depicts 5N2.
  2. The middle red function depicts 40 sin(N) + 4N2.
  3. The bottom green function depicts 3N2.

The shaded area represents the range between the top and bottom functions.

This graph provides a visual demonstration of the big-theta definition. We can say that the function 40 sin(N) + 4N2 is in Θ(N2) because because it is bounded below by the green function 3N2 (k1 = 3) and bounded above by the blue function 5N2 (k2 = 5).

The red function is only “squeezed” between the blue and green functions for N > 6. This is why big-theta is an asymptotic notation: its logic applies to all values of N greater than some initial N0. When we make a claim about asymptotic runtime, we must make sure that our analysis holds true for all possible “large inputs” N.

Now that we’ve learned the steps to perform asymptotic analysis, we can finally answer the question, “What is the asymptotic runtime of indexOf with respect to N, the length of the input array A?”

Best case indexOf orders of growth
In the best case, the order of growth for the runtime of indexOf is constant.
In other words, in the best case, the asymptotic runtime of indexOf is in Θ(1).
Worst case indexOf orders of growth
In the worst case, the order of growth for the runtime of indexOf is linear with respect to N, the length of A.
In other words, in the worst case, the asymptotic runtime of indexOf is in Θ(N) with respect to N, the length of A.

Iterative Sorts

Insertion.javaSelection.java

  1. Describe the problem of sorting, ordering relations, and stability.
  2. Trace each iteration of selection sort and insertion sort.
  3. Identify a run of selection sort and insertion sort on an array.

There are different ways to find duplicates in an array. dup1 represents one way to solve the problem: exhaustively check all possible pairs of elements and return whether a duplicate exists among them. In the worst case, dup1 required quadratic time to return an answer.

How long might a quadratic time algorithm need to run?

  • When N = 10, less than 1 second.
  • When N = 100, less than 1 second.
  • When N = 1,000, about 1 second.
  • When N = 10,000, about 2 minutes.
  • When N = 100,000, about 3 hours.
  • When N = 1,000,000, about 12 days.
How long might a linear time algorithm need to run?
  • When N = 10, less than 1 second.
  • When N = 100, less than 1 second.
  • When N = 1,000, less than 1 second.
  • When N = 10,000, less than 1 second.
  • When N = 100,000, less than 1 second.
  • When N = 1,000,000, about 1 second.

What can we learn from these findings?

Many algorithms (including all algorithms in this course) are fast on tiny inputs when N < 1000
While there may be situations where we care about the efficiency of algorithms processing tiny inputs, these differences are often unnoticeable. When they are noticeable, they may be caused by unanticipated changes in your computer’s workload.
Differences become more appreciable when inefficient algorithms run on inputs of size N > 1000
Whether the runtime is tolerable depends on the problem. An inefficient algorithm may be acceptable if we only need to run it infrequently, the size of the input is not particularly big, or the amount of processing power can make up for the time.
Therefore, efficient algorithms are often needed to process even moderately-large data frequently.

However, if our array is sorted, then returning whether there are duplicates in the array takes much less time in the worst case. In a sorted array, all duplicate elements must be stored right beside each other. It turns out that a significant number of problems in computer science get a lot easier and a lot more efficient to solve if we can first sort our data. Without sorting, many of the systems and software that we have today would not be possible purely because things (like duplicate finding) would take far too long to run.

But data doesn’t always come to us in a pre-sorted form. How do we sort—rearrange into order—an array of elements?

Sorting definitions

Sorting is the problem of rearranging a sequence of elements into non-decreasing order. How do we define the order of elements? For numbers, smaller values appear before larger values in non-decreasing order. But sorting objects are a bit more complicated.

For objects, the implementer of the class is responsible for defining the ordering relation. Just as numbers have a natural definition for less-than, equal-to, and greater-than relationships, the class’s compareTo method provides a way to specify the relative order of any two elements of that class. For example, if we have emails where each Email object has a date, sender, and text, the compareTo method could be defined like an inbox where emails are sorted only by date.

public int compareTo(Email other) {
    // Compare the date of this email to the date of the other email.
    return this.date.compareTo(other.date);
}

In other situations like searching for all the emails from a particular sender (ignoring date), it might be more helpful to sort by the sender primarily and the text of their message secondarily.

public int compareTo(Email other) {
    // Compare the sender of this email to the sender of the other email.
    int cmp = this.sender.compareTo(other.sender);
    if (cmp != 0) {
        // If the senders are different, return the difference.
        return cmp;
    }
    // Otherwise, the senders are the same, so compare by text content.
    return this.text.compareTo(other.text);
}
In the email example, where is the compareTo method defined?

The compareTo method for the Email class is defined inside the Email class by the people who wrote the class. When we talk about sorting algorithms, the author of the sorting algorithm has no idea the exact objects that they’re sorting or the ordering relation. A sorting algorithm just calls compareTo to determine whether an object should appear before or after another object. Sorting involves repeatedly asking, “Is this object less-than, equal-to, or greater-than this other object?” by repeatedly calling the compareTo method.

A sort is stable if it preserves the original order of equivalent keys. Stability can affect the final sorting output when there are two or more elements considered equal according to the ordering relation. For basic data types like numbers, stability doesn’t matter: any two numbers that share the same numeric value are just the same either way. But for objects like emails, stability can make a big difference. Imagine sorting emails by sender name: you’ll often receive many emails from the same sender, so a stable sort for emails will sort by sender name and, for each sender name, maintain the original order of emails that you received from that sender. An unstable sort doesn’t guarantee the original order of emails for each person.

So how do we sort an array? In this lesson, we’ll introduce two iterative sorting algorithms: selection sort and insertion sort.

Selection sort

In each iteration, selection sort selects the smallest unsorted element and swaps it into its sorted place.

Selection sort has an important invariant that was introduced called iterative improvement. We keep track of a sorted elements portion at the front of the array and gradually grow the number of sorted elements. Once the sorted elements portion reaches the full length of the array, we’re done sorting the array!

Give an asymptotic runtime analysis of selection sort with respect to the number of elements.

First, consider if there are any other factors besides number of elements (the asymptotic variable) that could require further case analysis. In selection sort, there are no other factors that affect the runtime because we always need to scan across the entire unsorted elements portion of the array to find the smallest unsorted element.

The runtime for selection sort is similar to the runtime for dup1, which we described using the summation (N - 1) + (N - 2) + … + 3 + 2 + 1, which we determined to have a quadratic order of growth. In other words, we can say that selection sort is in Θ(N2).

Insertion sort

In each iteration, insertion sort inserts the next unsorted element into the sorted elements portion at the front of the array by swapping it left one index at a time until it is in its correct position. Like selection sort, insertion sort is also an iterative improvement sorting algorithm.

Give an asymptotic runtime analysis of insertion sort with respect to the number of elements.

Unlike selection sort, insertion sort is affected by the order of elements. When the input is already sorted, insertion sort has a linear order of growth because there are no elements that need to be swapped to the left. On the other hand, given a reverse-sorted input, insertion sort needs to perform a very large number of swaps to move each next unsorted element into its correct position. The summation is exactly the same as dup1 and selection sort, which we determined to have a quadratic order of growth.

Open the VisuAlgo module to visualize sorting algorithms. Press Esc to exit the e-Lecture Mode, and choose either SEL or INS from the top navigation bar to switch to selection sort or insertion sort respectively. Run the sorting algorithm using Sort from the bottom left menu.

Sorting Visualization

Merge Sort

BinarySearch.javaMerge.java

  1. Analyze the runtime of a recursive algorithm using recurrences.
  2. Trace the recursive execution of merge sort.
  3. Identify a run of merge sort on an array.

Sequential search returns the index of an element in an array in worst case linear time by scanning across the array and comparing each element to the target. Although linear time algorithms are much more efficient than quadratic time algorithms, there are many situations where we need to make a large number of searches on a large amount of data.

A linear time algorithm like worst case sequential search might have the following real-world runtimes.

  • When N = 1 million, about 1 second.
  • When N = 10 million, about 10 seconds.
  • When N = 100 million, about 100 seconds.

Imagine how differently we would interact with technologies if search results took 100 or more seconds to process. Sorting elements is one way to enable more efficient searching.

Binary search returns the index of an element in a sorted array in worst case logarithmic time by using the sorted order to discard half the remaining elements after each comparison. Instead of checking each element one-by-one from left-to-right in the array, binary search instead starts at the middle of the current problem and compares to the middle element to decide whether the proceed left or right.

public static int binarySearch(int[] sorted, int target) {
    return binarySearch(sorted, target, 0, sorted.length);
}

private static int binarySearch(int[] sorted, int target, int low, int high) {
    if (low > high) {
        return -1;
    }
    int N = high - low;
    int mid = low + (high - low) / 2;
    if (sorted[mid] < target) {
        return binarySearch(sorted, target, mid + 1, high);
    } else if (sorted[mid] > target) {
        return binarySearch(sorted, target, low, mid - 1);
    } else {
        return mid;
    }
}
When does the best case runtime occur for binary search?

In the best case, the target is the exact middle element in the sorted array, which can be found in constant time.

How about the worst case? In the worst case, we don’t find the target immediately. In fact, we might not even find the target at all in the array, and instead return -1. In each recursive call, half the remaining elements under consideration can be discarded. In other words, the number of recursive calls is given by the answer to the question:

How many times do we need to divide N by 2 until only 1 element remains?

This is the definition of the base-2 logarithm, often written as either log2 or the shorthand lg. Therefore, the worst case order of growth for the runtime for binarySearch is logarithmic with respect to N, the number of elements that still need to be examined. Initially, all the elements need to be examined. But as the recursion proceeds, fewer and fewer items remain until we find the target or report that it’s not found by returning -1.

How long would a logarithmic time algorithm take to process large amounts of data? Let’s compute the base-2 logarithm for big data.

  • When N = 1 million, log2 1 million is about 20.
  • When N = 10 million, log2 10 million is about 23.
  • When N = 100 million, log2 100 million is about 27.

Recurrences

The runtime of a recursive program like binary search can be more effectively modeled using recurrence relations. Recurrence relations (aka recurrences) are recursive equations that represent the order of growth for a function in two parts: (1) non-recursive work and (2) recursive work. A recurrence relation describing the worst-case asymptotic runtime for binary search is T(N) = T(N / 2) + 1.

T(N) =
On the left hand side of the equals sign, T(N) refers to the runtime for binary search in terms of the size of the current subproblem, N. In the code above, note that N = high - low.
T(N / 2) + 1
On the right hand side of the equals sign, there are two components. The recursive work is given by the expression T(N / 2) because binary search makes a single recursive call to itself on a subproblem of half the size. The non-recursive work is given by the expression 1 because, ignoring the recursive calls, binary search spends a constant amount of time comparing sorted[mid] < target.

One way to solve a recurrence is by unrolling the recurrence, an approach that works by plugging the recurrence back into itself.

  1. T(N) = T(N / 2) + 1
  2. T(N) = [T(N / 4) + 1] + 1
  3. T(N) = [[T(N / 8) + 1] + 1] + 1
  4. T(N) = [[[T(N / 16) + 1] + 1] + 1] + 1

We can start to see a pattern emerge. The recursive work will eventually go down to T(1) when it calls the base case. Along the way to the base case, a lot of 1’s are added together. How many times are we comparing sorted[mid] to the target? Put another way, we can ask: How many times do we need to divide N by 2 in order to reduce it to the number 1? This is the base-2 logarithm again!

Recursive merge sort

Both selection sort and insertion sort have worst case quadratic order of growth, and relied on iterative improvement where a sorted elements portion was maintained at the front of the sequence at all times. If recursion was able to speed-up searching, perhaps it can also speed up sorting too.

Merge sort represents a different approach to sorting based on the idea of recursion. Merge sort’s name derives from the merge operation, which takes two sorted arrays and returns a sorted result containing all the elements in both arrays.

Merge sort is a recursive sorting algorithm that can be described as follows.

  1. If the array is of size 1, return.
  2. Recursively merge sort the left half.
  3. Recursively merge sort the right half.
  4. Merge the two sorted halves.

The recurrence relation for the runtime of merge sort can be given as T(N) = T(N / 2) + T(N / 2) + N + 1. The first recursive call to T(N / 2) represents the time it takes to merge sort the left half while the second call represents the time it takes to merge sort the right half. But we could also state it more simply as just T(N) = 2T(N / 2) + N because the runtime is the same regardless of which half we’re discussing.

Recurrence diagrams

Unrolling this equation ends up producing a mess of parentheses and brackets, so let’s try drawing a recurrence diagram to explain what’s going on and use visuals to help organize our analysis. In a recurrence diagram, the recursive work is drawn as nodes (with the current subproblem depicted inside the node) while the non-recursive work is drawn beside the node. If we can add up all of the non-recursive work (numbers outside the node), then we’ve computed the total work done by the algorithm.

Here’s an example of a recurrence diagram for merge sort on a 64-element array.

  • The top layer takes about 64 units of time merging 2 sorted halves of 32 elements each.
  • The second layer also takes about 64 units of time merging 4 sorted halves of 16 elements each.
  • The third layer also takes about 64 units of time merging 8 sorted halves of 8 elements each.
  • And on and on until the algorithm reaches its base case.

Merge sort recurrence diagram

By identifying the pattern in the recurrence diagram, we can see that all the nodes that are on the same layer will take about 64 units of time. Since the entire runtime of merge sort is represented by this diagram, we can find the total time spent by multiplying the number of layers by the time spent on each layer. If we think about the problem more generally in terms of the size of the current subproblem N, then:

  • Each layer divides N in half until the base case of 1 element. Therefore, there are log2 N layers.
  • Each layer takes about N total non-recursive time.
  • Therefore, the runtime of merge sort is in Θ(N log N).

We call this N log N order of growth linearithmic (a portmanteau of “linear” and “logarithmic”).