# Algorithm Analysis

## Table of contents

- Characterizing Runtime
- Counting Steps
- Why Scaling Matters
- Asymptotic Analysis
- Simplified Modeling Process

We can analyze algorithms in many different ways.

- Time complexity
- How much time does it take for your algorithm to execute?
- Space complexity
- How much memory does your algorithm require?
- Societal impact
- How does your algorithm affect the rest of the world?

We will investigate all of these different costs in this course, starting with time complexity. Time complexity analysis is also known as running time (or runtime) analysis.

## Characterizing Runtime

Suppose we’re trying to determine if a sorted array contains duplicate values. Here are two ways to solve the problem.

`dup1`

- Consider every pair of items, returning true if any match!
`dup2`

- Take advantage of the sorted nature of our array.
- We know that if there are duplicates, they must be next to each other.
- Compare neighbors: return true first time you see a match! If no more items, return false.

We can see that `dup1`

seems like it’s doing a lot more unnecessary, redundant work than `dup2`

. But how much more work? Ideally, we want our characterization to be **simple and mathematically rigorous** while also clearly **demonstrating the superiority** of `dup2`

over `dup1`

.

## Counting Steps

One characterization of runtime is by counting steps, or the number of operations executed by a program.

- Look at your code and the various operations that it uses (i.e. assignments, incrementations, etc.).
- Count the number of times each operation is performed.

### dup1

Let’s count the number of steps executed as a result of calling `dup1`

on an array of size N = 10000.

```
public static boolean dup1(int[] A) {
for (int i = 0; i < A.length; i += 1) {
for (int j = i + 1; j < A.length; j += 1) {
if (A[i] == A[j]) {
return true;
}
}
}
return false;
}
```

## How many times is the operation i = 0 executed?

`i = 0`

is only initialized once at the beginning of the nested `for`

loops.

The analysis gets more complicated due to the `if`

statement. In the **best case**, the program could exit early if a duplicate is found near the beginning of the array. In the **worst case**, the program could continue until the `return false`

statement at the end if the array does not contain any duplicates.

## What is the least and most number of times that the operation j = i + 1 is executed?

1 to 10000 times.

This process gets tedious very quickly. Double check that the counts in the table below match what you expect.

Operation | Count N = 10000 |
---|---|

less-than `<` | 2 to 50,015,001 |

increment `+= 1` | 0 to 50,005,000 |

equals to `==` | 1 to 49,995,000 |

array accesses | 2 to 99,990,000 |

Not only is computing these counts tedious, but it doesn’t tell us about how the algorithm **scales** as N, the size of the array, increases. Rather than setting N = 10000, we can instead determine the count in terms of N.

Operation | Count N = 10000 | Symbolic Count |
---|---|---|

`i = 0` | 1 | 1 |

`j = i + 1` | 1 to 10,000 | 1 to N |

less-than `<` | 2 to 50,015,001 | 2 to (N^{2} + 3N + 2) / 2 |

increment `+= 1` | 0 to 50,005,000 | 0 to (N^{2} + N) / 2 |

equals to `==` | 1 to 49,995,000 | 1 to (N^{2} - N) / 2 |

array accesses | 2 to 99,990,000 | 2 to N^{2} - N |

### dup2

Try to come up with rough estimates for the symbolic and exact counts for at least one of the operations for `dup2`

and check that the rest of the counts match what you expect.

Operation | Count N = 10000 | Symbolic Count |
---|---|---|

`i = 0` | 1 | 1 |

less-than `<` | ||

increment `+= 1` | ||

equals to `==` | ||

array accesses |

```
public static boolean dup2(int[] A) {
for (int i = 0; i < A.length - 1; i += 1) {
if (A[i] == A[i + 1]) {
return true;
}
}
return false;
}
```

## Solution for dup2

Operation | Count N = 10000 | Symbolic Count |
---|---|---|

`i = 0` | 1 | 1 |

less-than `<` | 1 to 10000 | 1 to N |

increment `+= 1` | 0 to 9999 | 0 to N - 1 |

equals to `==` | 1 to 9999 | 1 to N - 1 |

array accesses | 2 to 19998 | 2 to 2N - 2 |

## Why Scaling Matters

`dup2`

is better! But why?

- An answer
- It takes fewer operations to accomplish the same goal.
- Better answer
- Algorithm
**scales better**in the worst case: (N^{2}+ 3N + 2) / 2 vs. N.

The better answer provides the start of a mathematical argument for the superiority of `dup2`

over `dup1`

.

- Even better answer
- Parabolas grow faster than lines.

Computer scientists are interested in communicating ideas about algorithms. The even better answer here conveys a more general geometric intuition about the **order of growth** of the runtime for `dup2`

compared to `dup1`

. As the size of the array (N) grows, the parabolic N^{2}-time algorithm will take much longer to execute than the linear N-time algorithm.

## Asymptotic Analysis

The goal of time complexity analysis is to make an argument about the running time of an algorithm. In most cases, we only care about what happens for very large N (asymptotic behavior). We want to consider what types of algorithms would best handle big amounts of data, such as in the following examples.

- Simulation of billions of interacting particles
- Social network with billions of users
- Encoding billions of bytes of video data

Algorithms that scale well (modeled by lines) have better asymptotic runtime behavior than algorithms that scale relatively poorly (modeled by parabolas). While the idea of modeling runtime with parabolas and lines is simple, it’s not mathematically rigorous. Let’s develop the idea of order of growth and formalize it with mathematics.

Suppose we have an algorithm with the following step counts.

Operation | Symbolic Count |
---|---|

less-than `<` | 100N^{2} + 3N |

greater-than `>` | 2N^{3} + 1 |

and `&&` | 5000 |

## What do you expect will be the overall order of growth of the runtime for the algorithm?

N^{3} (cubic) since the majority of the runtime will be spent on greater-than operations (assuming a large input N). Adding on the less-than operations and the and operations doesn’t affect the overall cubic order of growth.

The “dominating” operation in the step count table is what ultimately determines the overall order of growth for large inputs.

- Cost model
- A representative operation that models the overall order of growth.

For example, the greater-than operation is a good cost model for the overall order of growth of the algorithm above. When considering order of growth analysis, we can also ignore lower order terms and multiplicative constants. By choosing a cost model, we already discard information about these less-significant factors.

Applying this order of growth analysis back to `dup1`

and `dup2`

, we can make the following statements about their runtime.

- The worst case order of growth of the runtime for
`dup1`

is N^{2}. - The worst case order of growth of the runtime for
`dup2`

is N. - The best case order of growth of the runtime for
`dup1`

and`dup2`

is constant.

## Simplified Modeling Process

If we only want the simplified order of growth, rather than building the entire step count table, we can instead:

- Choose our
**cost model**(representative operation). - Figure out the order of growth for the count of the cost model by either:
- Making an exact count and then discarding the unnecessary pieces
- Or, using intuition/inspection to determine orders of growth. (Needs practice!)

In lecture, we’ll redo our analysis of `dup1`

using this Simplified Modeling Process. We’ll also introduce a mathematical notation called Big-Theta, Big-O, and Big-Omega to formally define this idea of **order of growth**.