Algorithm Analysis

Complexity
Runtime analysis
Understanding
Modeling
Formalizing
Binary search

Complexity

Beyond whether or not a program is correct, computer scientists are often concerned about the complexity of a program. We actually already have quite a bit of experience with one particular kind of complexity that arises from how we design abstractions for our programs. For instance, pulling repeated code out into a method gives it a name and makes it easier for the reader to understand the behavior of the program. Cognitive complexity is about how easy it is to understand a program. In this course, we consider cognitive complexity from the perspective of code quality (internal correctness).

But code is just one way for programmers to communicate ideas to others, and writing code requires worrying about getting the code to work, which might take a lot more time than simplying coming up with the idea for the program. Instead, programmers often communicate ideas to others in terms of algorithms, the more generalizable list of steps behind the code. Whereas code is specific to a particular programming language, we can communicate algorithms in natural languages like English too. In fact, whenever we’re describing a program to someone else in English, what we’re really describing is an algorithm.

Algorithms are the ideas that power apps.

In Autocomplete, we defined the Term data type that an algorithm will use to display matching terms in descending weight order.
In Letter Inventory, we’ll define a LetterInventory data type that an algorithm will use to display the most similar suggestions.
But next week, for Search Engine, we’ll define an algorithm for surfacing relevant web pages based on a given query.

When we consider algorithms, we’re not only interested in correctness. Today, we’ll study one metric for algorithm analysis called time complexity: how long it takes for an algorithm to run on an abstract (conceptual model) computer.

Runtime analysis

Runtime analysis is the process of determining the time complexity of an algorithm.

Like code, complexity is all about communicating ideas about algorithms to others. This communication is most effective when it is (1) simple and (2) easy to compare. Computer scientists follow a three-step process to effectively communicate ideas about time complexity.

Runtime analysis process

Understanding the behavior or implementation details of the algorithm.
Modeling the number of steps or operations needed to execute the algorithm.
Formalizing the result in precise English or mathematical notation.

Understanding

Let’s apply the runtime analysis process to analyze an algorithm that returns the index of target in an int[] array, or -1 if target is not in the array.

static int indexOf(int[] array, int target) {
    for (int i = 0; i < array.length; i++) {
        if (array[i] == target) {
            return i;
        }
    }
    return -1;
}

The behavior of the indexOf algorithm depends on the input values.

Describe each step of the algorithm's implementation details in English.

Initialize an integer index counter, i.
Loop over the length of the array. On each iteration, if the array[i] is the same as the given target, return the current index. Otherwise, increment i.
If no elements in the array are the same as the given target, return -1.

Modeling

Modeling is about counting the number of steps or operations, such as assignments, boolean evaluations, arithmetic operations, method calls, etc.

int a = 4 * 6;
int b = 9 * (a - 24) / (9 - 7);

The first assignment to a takes one step to compute 4 * 6 and one more step to assign the result, 24, to the variable a.

How many steps does it take to assign a value to the variable b?

a - 24
9 * (a - 24)
9 - 7
9 * (a - 24) / (9 - 7)
Assignment to b

How do we apply this thinking to indexOf?

for (int i = 0; i < array.length; i++) {
    if (array[i] == target) {
        return i;
    }
}

The goal of time complexity analysis is to make an argument about the running time of an algorithm. In most cases, we only care about asymptotic analysis: what happens for very large N (asymptotic behavior). We want to consider which algorithms are best suited for handling big amounts of data, such as the millions of nucleotides that make up our DNA. The running time differences between algorithms are easier to appreciate when we consider big data because even very slow algorithms will probably solve small problems quickly since computers are so fast nowadays. It’s when we have big data that running time becomes a serious concern.

Asymptotic analysis: Describes the step count in terms of N, the size of the input, as N becomes very large. In the case of indexOf, the number of steps it takes to compute indexOf is primarily determined by the length of the array, which we’ll call N. A longer array may result in more iterations of the for loop.

When we talk about runtime in terms of “iterations of a loop”, we’re making an important conceptual simplification.

Cost model: Instead of counting every single operation, the cost model is an operation used to estimate the overall order of growth. The number of iterations of the for loop is one possible cost model since it is directly correlated to the total number of steps that it takes to evaluate the algorithm.

What factors affect the number of iterations of the for loop?

While the length of the array controls the stopping condition for the loop, the exact number of iterations also depends on the values in the array and the target value because the return statement will cause control to leave the indexOf method.

Case analysis: Accounts for variation in step count based on interactions between factors. Even with a very long array, it’s possible that the algorithm exits the for loop in the very first iteration if the array[0] == target. But it’s also possible that the algorithm increments the counter and loops through the entire array without finding the target.

Why don't we consider the case of a zero-length array as input?

Remember that we chose the length of the array as the asymptotic variable, so we’re considering the number of steps it takes to evaluate the algorithm with a very long array, not a very small (or zero-length) one. While it’s true that a zero-length array would run quite quickly on indexOf, it would also run very quickly with any reasonable algorithm, so this is not a very useful analysis for comparing algorithms.

Formalizing

To summarize our findings, we can say the following about the runtime of indexOf.

Simple English: In the worst case (when the target is not in the array), the number of steps that it takes to evaluate indexOf is directly correlated with N, the length of the array.; In the best case (when array[0] == target), it takes about 4 steps to evaluate indexOf regardless of the length of the array.

This description succinctly captures the different factors that can potentially affect the runtime of indexOf. While this description captures all of the ideas necessary to convey the key message, computer scientists will typically enhance this description with mathematics to capture a more general geometric intuition about the runtime.

Order of growth: In the worst case, the order of growth of the runtime of indexOf is linear with respect to N, the length of the array.; In the best case, the order of growth of the runtime of indexOf is constant with respect to N, the length of the array.

The order of growth relates the size of the input to the time that it takes to evaluate the algorithm on that input. Mathematicians formally defined a notation for order of growth using big-O (“big oh”). Computer scientists then adopted this notation for algorithm analysis in the 1970s. However, many programmers nowadays mistakenly use big-O notation when they really mean big-Θ (“big theta”). The mathematical notation that best represents “directly correlates” is big-Θ notation.

Big-Θ notation: In the worst case, the runtime of indexOf is in Θ(N) [the family of functions whose order of growth is linear] with respect to N, the length of the array.; In the best case, the runtime of indexOf is in Θ(1) [the family of functions whose order of growth is constant] with respect to N, the length of the array.

This notation makes it easy to compare algorithms!

Binary search

It turns out that the best-known algorithm for many important problems takes exponential time, making it impossible to solve them in practice. To put that into perspective, an input of size N = 100 (e.g. array.length == 100) that takes a linear-time algorithm less than 1 second to evaluate would take an exponential-time algorithm about 10¹⁷ years to compute a result.

The magnitude of improvement from exponential-time to linear-time is the same as the magnitude of improvement from linear-time to logarithmic-time. Logarithmic-time algorithms come up often in the study of computer algorithms, and they’re the key to enabling computers to work with large datasets. Earlier, we saw how indexOf represented a worst-case linear-time algorithm. Let’s analyze the runtime of binary search, a logarithmic-time algorithm for indexOf assuming a sorted array.

Suppose we have a sorted array of integers. Binary search is an efficient algorithm for returning the index of the target value in a sorted array.

static int binarySearch(int[] sorted, int target) {
    // Start looking at the whole sorted array
    int low = 0;
    int high = sorted.length - 1;
    // while low and high haven't met yet, there are still more indices to search
    while (low <= high) {
        // compute middle index and value
        int mid = low + (high - low) / 2;
        if (sorted[mid] < target) {
            low = mid + 1;
        } else if (sorted[mid] > target) {
            high = mid - 1;
        } else {
            return mid;
        }
    }
    return -1;
}

In each iteration of the loop, half of the elements under consideration can be discarded. The number of iterations of the while loop can be phrased as, “How many times do we need to divide N (length of the sorted array) by 2 until only 1 element remains?” This is the definition of the binary logarithm, or log₂.

Alternatively, we could rephrase the question as, “How many times do we need to multiply by 2 to reach N?” To solve for 2^x = N, we get x = log₂ N.