# Algorithm Analysis

## Table of contents

## Complexity

Beyond whether or not a program is correct, computer scientists are often concerned about the **complexity** of a program. We actually already have quite a bit of experience with one particular kind of complexity that arises from how we design abstractions for our programs. For instance, pulling repeated code out into a method gives it a name and makes it easier for the reader to understand the behavior of the program. **Cognitive complexity** is about how easy it is to understand a program. In this course, we consider cognitive complexity from the perspective of code quality (internal correctness).

But code is just one way for programmers to communicate ideas to others, and writing code requires worrying about getting the code to work, which might take a lot more time than simplying coming up with the idea for the program. Instead, programmers often communicate ideas to others in terms of **algorithms**, the more generalizable list of steps behind the code. Whereas code is specific to a particular programming language, we can communicate algorithms in natural languages like English too. In fact, whenever we’re describing a program to someone else in English, what we’re really describing is an algorithm.

Algorithms are the ideas that power apps.

- In Autocomplete, we defined the
`Term`

**data type**that an algorithm will use to display matching terms in descending weight order. - In Letter Inventory, we’ll define a
`LetterInventory`

**data type**that an algorithm will use to display the most similar suggestions. - But next week, for Search Engine, we’ll define an
**algorithm**for surfacing relevant web pages based on a given query.

When we consider algorithms, we’re not only interested in correctness. Today, we’ll study one metric for **algorithm analysis** called **time complexity**: how long it takes for an algorithm to run on an abstract (conceptual model) computer.

## Runtime analysis

**Runtime analysis** is the process of determining the time complexity of an algorithm.

Like code, complexity is all about communicating ideas about algorithms to others. This communication is most effective when it is (1) simple and (2) easy to compare. Computer scientists follow a three-step process to effectively communicate ideas about time complexity.

- Runtime analysis process
**Understanding**the behavior or implementation details of the algorithm.**Modeling**the number of steps or operations needed to execute the algorithm.**Formalizing**the result in precise English or mathematical notation.

## Understanding

Let’s apply the runtime analysis process to analyze an algorithm that returns the index of `target`

in an `int[] array`

, or -1 if `target`

is not in the array.

```
static int indexOf(int[] array, int target) {
for (int i = 0; i < array.length; i++) {
if (array[i] == target) {
return i;
}
}
return -1;
}
```

The behavior of the `indexOf`

algorithm depends on the input values.

## Describe each step of the algorithm's implementation details in English.

- Initialize an integer index counter,
`i`

. - Loop over the length of the array. On each iteration, if the
`array[i]`

is the same as the given`target`

, return the current index. Otherwise, increment`i`

. - If no elements in the array are the same as the given
`target`

, return -1.

## Modeling

Modeling is about counting the number of steps or operations, such as assignments, boolean evaluations, arithmetic operations, method calls, etc.

```
int a = 4 * 6;
int b = 9 * (a - 24) / (9 - 7);
```

The first assignment to `a`

takes one step to compute `4 * 6`

and one more step to assign the result, 24, to the variable `a`

.

## How many steps does it take to assign a value to the variable b?

`a - 24`

`9 * (a - 24)`

`9 - 7`

`9 * (a - 24) / (9 - 7)`

- Assignment to
`b`

How do we apply this thinking to `indexOf`

?

```
for (int i = 0; i < array.length; i++) {
if (array[i] == target) {
return i;
}
}
```

The goal of time complexity analysis is to make an argument about the running time of an algorithm. In most cases, we only care about **asymptotic analysis**: what happens for very large N (asymptotic behavior). We want to consider which algorithms are best suited for handling big amounts of data, such as the millions of nucleotides that make up our DNA. The running time differences between algorithms are easier to appreciate when we consider big data because even very slow algorithms will probably solve small problems quickly since computers are so fast nowadays. It’s when we have big data that running time becomes a serious concern.

- Asymptotic analysis
- Describes the step count in terms of N, the size of the input, as N becomes very large. In the case of
`indexOf`

, the number of steps it takes to compute`indexOf`

is primarily determined by the length of the`array`

, which we’ll call N. A longer`array`

may result in more iterations of the for loop.

When we talk about runtime in terms of “iterations of a loop”, we’re making an important conceptual simplification.

- Cost model
- Instead of counting every single operation, the cost model is an operation used to estimate the overall order of growth. The number of iterations of the for loop is one possible cost model since it is directly correlated to the total number of steps that it takes to evaluate the algorithm.

## What factors affect the number of iterations of the for loop?

While the length of the `array`

controls the stopping condition for the loop, the exact number of iterations also depends on the values in the `array`

and the `target`

value because the `return`

statement will cause control to leave the `indexOf`

method.

- Case analysis
- Accounts for variation in step count based on interactions between factors. Even with a very long
`array`

, it’s possible that the algorithm exits the`for`

loop in the very first iteration if the`array[0] == target`

. But it’s also possible that the algorithm increments the counter and loops through the entire array without finding the`target`

.

## Why don't we consider the case of a zero-length array as input?

Remember that we chose the length of the `array`

as the asymptotic variable, so we’re considering the number of steps it takes to evaluate the algorithm with a very long `array`

, not a very small (or zero-length) one. While it’s true that a zero-length array would run quite quickly on `indexOf`

, it would also run very quickly with any reasonable algorithm, so this is not a very useful analysis for comparing algorithms.

## Formalizing

To summarize our findings, we can say the following about the runtime of `indexOf`

.

- Simple English
- In the worst case (when the
`target`

is not in the`array`

), the number of steps that it takes to evaluate`indexOf`

is directly correlated with N, the length of the`array`

. - In the best case (when
`array[0] == target`

), it takes about 4 steps to evaluate`indexOf`

regardless of the length of the`array`

.

This description succinctly captures the different factors that can potentially affect the runtime of `indexOf`

. While this description captures all of the ideas necessary to convey the key message, computer scientists will typically enhance this description with mathematics to capture a more general geometric intuition about the runtime.

- Order of growth
- In the worst case, the
**order of growth of the runtime**of`indexOf`

is**linear**with respect to N, the length of the`array`

. - In the best case, the
**order of growth of the runtime**of`indexOf`

is**constant**with respect to N, the length of the`array`

.

The **order of growth** relates the size of the input to the time that it takes to evaluate the algorithm on that input. Mathematicians formally defined a notation for order of growth using **big-O** (“big oh”). Computer scientists then adopted this notation for algorithm analysis in the 1970s. However, many programmers nowadays mistakenly use big-O notation when they really mean **big-Θ** (“big theta”). The mathematical notation that best represents “directly correlates” is big-Θ notation.

- Big-Θ notation
- In the worst case, the runtime of
`indexOf`

is in Θ(N) [the family of functions whose order of growth is linear] with respect to N, the length of the`array`

. - In the best case, the runtime of
`indexOf`

is in Θ(1) [the family of functions whose order of growth is constant] with respect to N, the length of the`array`

.

This notation makes it easy to compare algorithms!

## Binary search

It turns out that the best-known algorithm for many important problems takes exponential time, making it impossible to solve them in practice. To put that into perspective, an input of size N = 100 (e.g. `array.length == 100`

) that takes a linear-time algorithm less than 1 second to evaluate would take an exponential-time algorithm about 10^{17} **years** to compute a result.

The magnitude of improvement from exponential-time to linear-time is the same as the magnitude of improvement from linear-time to logarithmic-time. Logarithmic-time algorithms come up often in the study of computer algorithms, and they’re the key to enabling computers to work with large datasets. Earlier, we saw how `indexOf`

represented a worst-case linear-time algorithm. Let’s analyze the runtime of **binary search**, a logarithmic-time algorithm for `indexOf`

assuming a sorted array.

Suppose we have a `sorted`

array of integers. Binary search is an efficient algorithm for returning the index of the `target`

value in a `sorted`

array.

```
static int binarySearch(int[] sorted, int target) {
// Start looking at the whole sorted array
int low = 0;
int high = sorted.length - 1;
// while low and high haven't met yet, there are still more indices to search
while (low <= high) {
// compute middle index and value
int mid = low + (high - low) / 2;
if (sorted[mid] < target) {
low = mid + 1;
} else if (sorted[mid] > target) {
high = mid - 1;
} else {
return mid;
}
}
return -1;
}
```

In each iteration of the loop, half of the elements under consideration can be discarded. The number of iterations of the `while`

loop can be phrased as, “How many times do we need to divide N (length of the `sorted`

array) by 2 until only 1 element remains?” This is the definition of the **binary logarithm**, or log_{2}.

Alternatively, we could rephrase the question as, “How many times do we need to multiply by 2 to reach N?” To solve for 2^{x} = N, we get x = log_{2} N.