# Runtime and Sorting

Asymptotic analysis, iterative sorts, and merge sort.

## Asymptotic Analysis

- Compare and contrast runtime analysis, asymptotic analysis, and case analysis.
- Analyze the order of growth of a function as constant, linear, or quadratic.
- Identify big-theta asymptotic notation for the order of growth of a function.

In your previous programming experience, we focused on **correctness** and **maintainability**—code that produces the expected output given a particular input, and code that is easy to improve or extend in the future. But, as we’ll see throughout this course, it’s also important that we design programs with a focus on **efficiency** too.

**Experimental runtime analysis** is a method for empirically measuring how long a *program* takes to run by recording exactly how long it takes to run on different inputs. Experimental analysis is particularly helpful for identifying and addressing performance issues by indicating exactly which operations are contributing to high runtime. This data can be used to create graphs that plot the runtime as the input to the program grows larger and larger.

But what if we want to *predict* the runtime of a program *before* writing any code? What other methods can we use to help us design the specification to a program so that it meets our performance needs?

**Asymptotic runtime analysis** is the process of predicting the amount of time an *algorithm* will take to run on large inputs. Asymptotic analysis focuses on analyzing **algorithms**, the concepts underlying how programs work. Compared to experimental analysis, which focuses on empirical measurement, asymptotic analysis focuses on reasoning and logic to analyze the runtime of an algorithm.

Computer scientists specifically focus on large inputs (as the size of the input approaches infinity, or its *asymptote*) rather than small inputs because larger inputs more clearly demonstrate differences in efficiency. For example, the performance issue in `ArrayListDeque`

only appears when a large number of web pages have been stored in browser history; `ArrayListDeque`

is quite fast when only a few web pages have been stored. The differences in runtime become more appreciable when we’re working with a very large number of elements.

The goal of asymptotic runtime analysis is to produce a (1) *predictive* and (2) *easy-to-compare* description of the running time of an algorithm. Consider the `indexOf`

method, which returns the index of a `target`

number in an `int[] A`

or -1 if `target`

is not in the array.

```
static int indexOf(int[] A, int target) {
for (int i = 0; i < A.length; i += 1) {
if (A[i] == target) {
return i;
}
}
return -1;
}
```

The behavior of the `indexOf`

algorithm depends on the input values.

- Initialize an integer index counter,
`i`

. - Loop over the length of the array. On each iteration, if the
`A[i]`

is the same as the given`target`

, return the current index. Otherwise, increment`i`

. - If no elements in the array are the same as the given
`target`

, return -1.

### Model the number of steps

**How many steps (or operations) are needed to run the algorithm?** Operations include assignments, boolean evaluations, arithmetic operations, method calls, etc.

Let’s take a look at these two assignment statements to illustrate how to count steps.

```
int a = 4 * 6;
int b = 9 * (a - 24) / (8 - 7);
```

- The first assignment to
`a`

takes one step (1) to compute`4 * 6`

and another step (2) to assign the result to the variable`a`

. - The second assignment to
`b`

takes one step (1) to compute`a - 24`

, another step (2) to compute`9 * (a - 24)`

, another step (3) to compute`8 - 7`

, another step (4) to compute`9 * (a - 24) / (8 - 7)`

, and a final step (5) to assign the result to the variable`b`

.

In total, these two lines of code take **7 steps** for the computer to run. The first goal of runtime analysis was to produce a *predictive* description of the runtime of an algorithm, but in this example, there’s not much to predict: the two assignment statements always take precisely 7 steps.

`indexOf`

is different: it has a loop and an `if`

statement that depends on the values in the `A`

compared to the `target`

. **How do we model the number of steps it takes to run an algorithm in general?** Here are two factors that can affect the number of steps to compute

`indexOf`

.- The length of the
`A`

. - The values of each
`int`

in the`A`

compared to the value of the`target`

.

The runtime of many algorithms is greatly affected by the size of the input. A small array typically takes little time to run regardless of whether the algorithm is fast or slow. Where we really start to appreciate the differences in runtime is when the array is very large. Therefore, when computer scientists talk about runtime analysis, the default assumption is asymptotic analysis.

**Asymptotic analysis** is a way of evaluating the efficiency of an algorithm on large inputs. The choice for the “large input” is called the **asymptotic variable**. For `indexOf`

, the length of the `A`

can be the asymptotic variable.

In this course, we’ll usually tell you the asymptotic variable using the phrases, “with respect to” or “in terms of”. Asymptotic analysis is defined in terms of the asymptotic variable, so make sure to confirm the asymptotic variable before analyzing a program.

But we also know that the runtime depends on other factors such as the relationship between the `target`

and the numbers in the `A`

. **Case analysis** is a way to account for variation in the model based on interactions between other factors besides the asymptotic variable.

- Best case
`indexOf`

- The
**best case**(most efficient or fastest) occurs when the`target`

is the same as the`A[0]`

—even when the array is very large. - Worst case
`indexOf`

- The
**worst case**(least efficient or slowest) occurs when the`target`

is the same as the last element in the`A`

or not even in the`A`

at all.

All other factors not covered by the asymptotic variable are considered in case analysis. In this course, we’ll focus only on the best and worst case: take all the remaining factors that you have at your disposal and choose a situation that produces the fastest (best case) or the slowest (worst case) runtime.

We now have a predictive model for the number of steps (a proxy for the time) it takes to run an algorithm. But this model is not easy to compare. To communicate our model, we’ll introduce some new vocabulary used to express asymptotic analysis.

### Formalize the runtime model

**How do we express our model to others so that we can communicate results?**

The **order of growth** relates the size of the input to the time that it takes to run the algorithm on that input. Often, we’ll use a mathematical variable like *N* to refer to the asymptotic variable.

- Best case
`indexOf`

orders of growth - In the
**best case**, the order of growth for the runtime of`indexOf`

is**constant**. - Worst case
`indexOf`

orders of growth - In the
**worst case**, the order of growth for the runtime of`indexOf`

is**linear**with respect to*N*, the length of`A`

.

When the order of growth is constant, we don’t specify “with respect to *N*” because constant order of growth implies that the runtime does not scale with *N*.

Orders of growth help us compare runtime in terms of words like *constant*, *linear*, and *quadratic*. In practice, though, you’ll rarely see computer scientists writing out the full orders of growth sentence. Instead, computer scientists communicate orders of growth using asymptotic notation.

### Asymptotic notation

Asymptotic notation provides precise, mathematical shorthand for orders of growth.

Consider this graph that depicts three functions of *N*.

## What do the x-axis and the y-axis represent in the graph?

The x-axis represents the size of the input, *N*. The y-axis represents the model’s prediction for the number of steps that the algorithm will require for an input of a given size.

The three functions depicted in the graph are:

- The top blue function depicts 5
*N*^{2}. - The middle red function depicts 40 sin(
*N*) + 4*N*^{2}. - The bottom green function depicts 3
*N*^{2}.

The shaded area represents the range between the top and bottom functions.

This graph provides a visual demonstration of the big-theta definition. We can say that the function 40 sin(*N*) + 4*N*^{2} *is in* Θ(*N*^{2}) because because it is bounded below by the green function 3*N*^{2} (*k*_{1} = 3) and bounded above by the blue function 5*N*^{2} (*k*_{2} = 5).

The red function is only “squeezed” between the blue and green functions for *N* > 6. This is why big-theta is an *asymptotic notation*: its logic applies to all values of *N* greater than some initial *N*_{0}. When we make a claim about asymptotic runtime, we must make sure that our analysis holds true for all possible “large inputs” *N*.

Now that we’ve learned the steps to perform asymptotic analysis, we can finally answer the question, “What is the asymptotic runtime of `indexOf`

with respect to *N*, the length of the input array `A`

?”

- Best case
`indexOf`

orders of growth - In the best case, the order of growth for the runtime of
`indexOf`

is constant. - In other words, in the best case, the asymptotic runtime of
`indexOf`

is in Θ(1). - Worst case
`indexOf`

orders of growth - In the worst case, the order of growth for the runtime of
`indexOf`

is linear with respect to*N*, the length of`A`

. - In other words, in the worst case, the asymptotic runtime of
`indexOf`

is in Θ(*N*) with respect to*N*, the length of`A`

.

## Iterative Sorts

- Describe the problem of sorting, ordering relations, and stability.
- Trace each iteration of selection sort and insertion sort.
- Identify a run of selection sort and insertion sort on an array.

There are different ways to find duplicates in an array. `dup1`

represents one way to solve the problem: exhaustively check all possible pairs of elements and return whether a duplicate exists among them. In the worst case, `dup1`

required quadratic time to return an answer.

How long might a quadratic time algorithm need to run?

- When
*N*= 10, less than 1 second. - When
*N*= 100, less than 1 second. - When
*N*= 1,000, about 1 second. - When
*N*= 10,000,**about 2 minutes**. - When
*N*= 100,000,**about 3 hours**. - When
*N*= 1,000,000,**about 12 days**.

## How long might a linear time algorithm need to run?

- When
*N*= 10, less than 1 second. - When
*N*= 100, less than 1 second. - When
*N*= 1,000, less than 1 second. - When
*N*= 10,000,**less than 1 second**. - When
*N*= 100,000,**less than 1 second**. - When
*N*= 1,000,000,**about 1 second**.

What can we learn from these findings?

- Many algorithms (including all algorithms in this course) are fast on tiny inputs when
*N*< 1000 - While there may be situations where we care about the efficiency of algorithms processing tiny inputs, these differences are often unnoticeable. When they are noticeable, they may be caused by unanticipated changes in your computer’s workload.
- Differences become more appreciable when
*inefficient algorithms*run on inputs of size*N*> 1000 - Whether the runtime is tolerable depends on the problem. An inefficient algorithm may be acceptable if we only need to run it infrequently, the size of the input is not particularly big, or the amount of processing power can make up for the time.
- Therefore, efficient algorithms are often needed to process even moderately-large data frequently.

However, if our array is *sorted*, then returning whether there are duplicates in the array takes much less time in the worst case. In a sorted array, all duplicate elements must be stored right beside each other. It turns out that a significant number of problems in computer science get a lot easier and a lot more efficient to solve if we can first sort our data. Without sorting, many of the systems and software that we have today would not be possible purely because things (like duplicate finding) would take far too long to run.

But data doesn’t always come to us in a pre-sorted form. How do we sort—rearrange into order—an array of elements?

### Sorting definitions

**Sorting** is the problem of rearranging a sequence of elements into non-decreasing order. How do we define the order of elements? For numbers, smaller values appear before larger values in non-decreasing order. But sorting objects are a bit more complicated.

For objects, the implementer of the class is responsible for defining the ordering relation. Just as numbers have a natural definition for less-than, equal-to, and greater-than relationships, the class’s `compareTo`

method provides a way to specify the relative order of any two elements of that class. For example, if we have emails where each `Email`

object has a date, sender, and text, the `compareTo`

method could be defined like an inbox where emails are sorted only by date.

```
public int compareTo(Email other) {
// Compare the date of this email to the date of the other email.
return this.date.compareTo(other.date);
}
```

In other situations like searching for all the emails from a particular sender (ignoring date), it might be more helpful to sort by the sender primarily and the text of their message secondarily.

```
public int compareTo(Email other) {
// Compare the sender of this email to the sender of the other email.
int cmp = this.sender.compareTo(other.sender);
if (cmp != 0) {
// If the senders are different, return the difference.
return cmp;
}
// Otherwise, the senders are the same, so compare by text content.
return this.text.compareTo(other.text);
}
```

## In the email example, where is the compareTo method defined?

The `compareTo`

method for the `Email`

class is defined inside the `Email`

class by the people who wrote the class. When we talk about sorting algorithms, the author of the sorting algorithm has no idea the exact objects that they’re sorting or the ordering relation. A sorting algorithm just calls `compareTo`

to determine whether an object should appear before or after another object. Sorting involves repeatedly asking, “Is this object less-than, equal-to, or greater-than this other object?” by repeatedly calling the `compareTo`

method.

A sort is **stable** if it preserves the original order of equivalent keys. Stability can affect the final sorting output when there are two or more elements considered equal according to the ordering relation. For basic data types like numbers, stability doesn’t matter: any two numbers that share the same numeric value are just the same either way. But for objects like emails, stability can make a big difference. Imagine sorting emails by sender name: you’ll often receive many emails from the same sender, so a stable sort for emails will sort by sender name *and*, for each sender name, maintain the original order of emails that you received from that sender. An unstable sort doesn’t guarantee the original order of emails for each person.

So how do we sort an array? In this lesson, we’ll introduce two iterative sorting algorithms: selection sort and insertion sort.

### Selection sort

In each iteration, **selection sort** selects the *smallest unsorted element* and swaps it into its sorted place.

Selection sort has an important invariant that was introduced called **iterative improvement**. We keep track of a *sorted elements* portion at the front of the array and gradually grow the number of *sorted elements*. Once the *sorted elements* portion reaches the full length of the array, we’re done sorting the array!

## Give an asymptotic runtime analysis of selection sort with respect to the number of elements.

First, consider if there are any other factors besides number of elements (the asymptotic variable) that could require further case analysis. In selection sort, there are no other factors that affect the runtime because we always need to scan across the entire *unsorted elements* portion of the array to find the *smallest unsorted element*.

The runtime for selection sort is similar to the runtime for `dup1`

, which we described using the summation (*N* - 1) + (*N* - 2) + … + 3 + 2 + 1, which we determined to have a quadratic order of growth. In other words, we can say that selection sort is in Θ(*N*^{2}).

### Insertion sort

In each iteration, **insertion sort** inserts the *next unsorted element* into the *sorted elements* portion at the front of the array by swapping it left one index at a time until it is in its correct position. Like selection sort, insertion sort is also an iterative improvement sorting algorithm.

## Give an asymptotic runtime analysis of insertion sort with respect to the number of elements.

Unlike selection sort, insertion sort is affected by the order of elements. When the input is *already sorted*, insertion sort has a linear order of growth because there are no elements that need to be swapped to the left. On the other hand, given a reverse-sorted input, insertion sort needs to perform a very large number of swaps to move each *next unsorted element* into its correct position. The summation is exactly the same as `dup1`

and selection sort, which we determined to have a quadratic order of growth.

Open the VisuAlgo module to visualize sorting algorithms. Press

`Esc`

to exit the e-Lecture Mode, and choose eitherSELorINSfrom the top navigation bar to switch to selection sort or insertion sort respectively. Run the sorting algorithm usingSortfrom the bottom left menu.

## Merge Sort

- Analyze the runtime of a recursive algorithm using recurrences.
- Trace the recursive execution of merge sort.
- Identify a run of merge sort on an array.

**Sequential search** returns the index of an element in an array in worst case linear time by scanning across the array and comparing each element to the target. Although linear time algorithms are much more efficient than quadratic time algorithms, there are many situations where we need to make a large number of searches on a large amount of data.

A **linear time algorithm** like worst case sequential search might have the following real-world runtimes.

- When
*N*= 1 million, about 1 second. - When
*N*= 10 million, about 10 seconds. - When
*N*= 100 million, about 100 seconds.

Imagine how differently we would interact with technologies if search results took 100 or more seconds to process. Sorting elements is one way to enable more efficient searching.

**Binary search** returns the index of an element in a *sorted array* in worst case logarithmic time by using the sorted order to discard half the remaining elements after each comparison. Instead of checking each element one-by-one from left-to-right in the array, binary search instead starts at the middle of the current problem and compares to the middle element to decide whether the proceed left or right.

```
public static int binarySearch(int[] sorted, int target) {
return binarySearch(sorted, target, 0, sorted.length);
}
private static int binarySearch(int[] sorted, int target, int low, int high) {
if (low > high) {
return -1;
}
int N = high - low;
int mid = low + (high - low) / 2;
if (sorted[mid] < target) {
return binarySearch(sorted, target, mid + 1, high);
} else if (sorted[mid] > target) {
return binarySearch(sorted, target, low, mid - 1);
} else {
return mid;
}
}
```

## When does the best case runtime occur for binary search?

In the best case, the `target`

is the exact middle element in the `sorted`

array, which can be found in constant time.

How about the worst case? In the worst case, we don’t find the `target`

immediately. In fact, we might not even find the `target`

at all in the array, and instead return -1. In each recursive call, half the remaining elements under consideration can be discarded. In other words, the number of recursive calls is given by the answer to the question:

How many times do we need to divide

Nby 2 until only 1 element remains?

This is the definition of the **base-2 logarithm**, often written as either log_{2} or the shorthand lg. Therefore, the worst case order of growth for the runtime for `binarySearch`

is **logarithmic** with respect to *N*, the number of elements that still need to be examined. Initially, all the elements need to be examined. But as the recursion proceeds, fewer and fewer items remain until we find the `target`

or report that it’s not found by returning -1.

How long would a logarithmic time algorithm take to process large amounts of data? Let’s compute the base-2 logarithm for big data.

- When
*N*= 1 million, log_{2}1 million is about 20. - When
*N*= 10 million, log_{2}10 million is about 23. - When
*N*= 100 million, log_{2}100 million is about 27.

### Recurrences

The runtime of a recursive program like binary search can be more effectively modeled using recurrence relations. **Recurrence relations** (aka recurrences) are recursive equations that represent the order of growth for a function in two parts: (1) non-recursive work and (2) recursive work. A recurrence relation describing the worst-case asymptotic runtime for binary search is T(N) = T(N / 2) + 1.

- T(N) =
- On the left hand side of the equals sign, T(N) refers to the runtime for binary search in terms of the size of the current subproblem,
*N*. In the code above, note that`N = high - low`

. - T(N / 2) + 1
- On the right hand side of the equals sign, there are two components. The
**recursive work**is given by the expression T(N / 2) because binary search makes a single recursive call to itself on a subproblem of half the size. The**non-recursive work**is given by the expression 1 because, ignoring the recursive calls, binary search spends a constant amount of time comparing`sorted[mid] < target`

.

One way to solve a recurrence is by **unrolling the recurrence**, an approach that works by plugging the recurrence back into itself.

- T(N) = T(N / 2) + 1
- T(N) = [T(N / 4) + 1] + 1
- T(N) = [[T(N / 8) + 1] + 1] + 1
- T(N) = [[[T(N / 16) + 1] + 1] + 1] + 1

We can start to see a pattern emerge. The recursive work will eventually go down to T(1) when it calls the base case. Along the way to the base case, a lot of 1’s are added together. How many times are we comparing `sorted[mid]`

to the `target`

? Put another way, we can ask: How many times do we need to divide *N* by 2 in order to reduce it to the number 1? This is the base-2 logarithm again!

### Recursive merge sort

Both selection sort and insertion sort have worst case quadratic order of growth, and relied on iterative improvement where a *sorted elements* portion was maintained at the front of the sequence at all times. If recursion was able to speed-up searching, perhaps it can also speed up sorting too.

**Merge sort** represents a different approach to sorting based on the idea of recursion. Merge sort’s name derives from the **merge** operation, which takes two *sorted arrays* and returns a *sorted result* containing all the elements in both arrays.

Merge sort is a recursive sorting algorithm that can be described as follows.

- If the array is of size 1, return.
- Recursively merge sort the left half.
- Recursively merge sort the right half.
- Merge the two sorted halves.

The recurrence relation for the runtime of merge sort can be given as T(N) = T(N / 2) + T(N / 2) + N + 1. The first recursive call to T(N / 2) represents the time it takes to merge sort the left half while the second call represents the time it takes to merge sort the right half. But we could also state it more simply as just T(N) = 2T(N / 2) + N because the runtime is the same regardless of which half we’re discussing.

### Recurrence diagrams

Unrolling this equation ends up producing a mess of parentheses and brackets, so let’s try drawing a recurrence diagram to explain what’s going on and use visuals to help organize our analysis. In a **recurrence diagram**, the recursive work is drawn as nodes (with the current subproblem depicted inside the node) while the non-recursive work is drawn beside the node. If we can add up all of the non-recursive work (numbers outside the node), then we’ve computed the total work done by the algorithm.

Here’s an example of a recurrence diagram for merge sort on a 64-element array.

- The top layer takes about 64 units of time merging 2 sorted halves of 32 elements each.
- The second layer also takes about 64 units of time merging 4 sorted halves of 16 elements each.
- The third layer also takes about 64 units of time merging 8 sorted halves of 8 elements each.
- And on and on until the algorithm reaches its base case.

By identifying the pattern in the recurrence diagram, we can see that all the nodes that are on the same layer will take about 64 units of time. Since the entire runtime of merge sort is represented by this diagram, we can find the total time spent by multiplying the number of layers by the time spent on each layer. If we think about the problem more generally in terms of the size of the current subproblem *N*, then:

- Each layer divides
*N*in half until the base case of 1 element. Therefore, there are log_{2}N layers. - Each layer takes about
*N*total non-recursive time. - Therefore, the runtime of merge sort is in Θ(N log N).

We call this N log N order of growth **linearithmic** (a portmanteau of “linear” and “logarithmic”).