Link

Algorithm Analysis

Table of contents

  1. Characterizing Runtime
  2. Counting Steps
    1. dup1
    2. dup2
  3. Why Scaling Matters
  4. Asymptotic Analysis
  5. Simplified Modeling Process

We can analyze algorithms in many different ways.

Time complexity
How much time does it take for your algorithm to execute?
Space complexity
How much memory does your algorithm require?
Societal impact
How does your algorithm affect the rest of the world?

We will investigate all of these different costs in this course, starting with time complexity. Time complexity analysis is also known as running time (or runtime) analysis.

Characterizing Runtime

Suppose we’re trying to determine if a sorted array contains duplicate values. Here are two ways to solve the problem.

dup1
Consider every pair of items, returning true if any match!
dup2
Take advantage of the sorted nature of our array.
  • We know that if there are duplicates, they must be next to each other.
  • Compare neighbors: return true first time you see a match! If no more items, return false.

We can see that dup1 seems like it’s doing a lot more unnecessary, redundant work than dup2. But how much more work? Ideally, we want our characterization to be simple and mathematically rigorous while also clearly demonstrating the superiority of dup2 over dup1.

Counting Steps

One characterization of runtime is by counting steps, or the number of operations executed by a program.

  1. Look at your code and the various operations that it uses (i.e. assignments, incrementations, etc.).
  2. Count the number of times each operation is performed.

dup1

Let’s count the number of steps executed as a result of calling dup1 on an array of size N = 10000.

public static boolean dup1(int[] A) {
  for (int i = 0; i < A.length; i += 1) {
    for (int j = i + 1; j < A.length; j += 1) {
      if (A[i] == A[j]) {
         return true;
      }
    }
  }
  return false;
}
How many times is the operation i = 0 executed?

i = 0 is only initialized once at the beginning of the nested for loops.

The analysis gets more complicated due to the if statement. In the best case, the program could exit early if a duplicate is found near the beginning of the array. In the worst case, the program could continue until the return false statement at the end if the array does not contain any duplicates.

What is the least and most number of times that the operation j = i + 1 is executed?

1 to 10000 times.

This process gets tedious very quickly. Double check that the counts in the table below match what you expect.

OperationCount N = 10000
less-than <2 to 50,015,001
increment += 10 to 50,005,000
equals to ==1 to 49,995,000
array accesses2 to 99,990,000

Not only is computing these counts tedious, but it doesn’t tell us about how the algorithm scales as N, the size of the array, increases. Rather than setting N = 10000, we can instead determine the count in terms of N.

OperationCount N = 10000Symbolic Count
i = 011
j = i + 11 to 10,0001 to N
less-than <2 to 50,015,0012 to (N2 + 3N + 2) / 2
increment += 10 to 50,005,0000 to (N2 + N) / 2
equals to ==1 to 49,995,0001 to (N2 - N) / 2
array accesses2 to 99,990,0002 to N2 - N

dup2

Try to come up with rough estimates for the symbolic and exact counts for at least one of the operations for dup2 and check that the rest of the counts match what you expect.

OperationCount N = 10000Symbolic Count
i = 011
less-than <  
increment += 1  
equals to ==  
array accesses  
public static boolean dup2(int[] A) {
  for (int i = 0; i < A.length - 1; i += 1) {
    if (A[i] == A[i + 1]) {
      return true;
    }
  }
  return false;
}
Solution for dup2
OperationCount N = 10000Symbolic Count
i = 011
less-than <1 to 100001 to N
increment += 10 to 99990 to N - 1
equals to ==1 to 99991 to N - 1
array accesses2 to 199982 to 2N - 2

Why Scaling Matters

dup2 is better! But why?

An answer
It takes fewer operations to accomplish the same goal.
Better answer
Algorithm scales better in the worst case: (N2 + 3N + 2) / 2 vs. N.

The better answer provides the start of a mathematical argument for the superiority of dup2 over dup1.

Even better answer
Parabolas grow faster than lines.

Computer scientists are interested in communicating ideas about algorithms. The even better answer here conveys a more general geometric intuition about the order of growth of the runtime for dup2 compared to dup1. As the size of the array (N) grows, the parabolic N2-time algorithm will take much longer to execute than the linear N-time algorithm.

Asymptotic Analysis

The goal of time complexity analysis is to make an argument about the running time of an algorithm. In most cases, we only care about what happens for very large N (asymptotic behavior). We want to consider what types of algorithms would best handle big amounts of data, such as in the following examples.

  • Simulation of billions of interacting particles
  • Social network with billions of users
  • Encoding billions of bytes of video data

Algorithms that scale well (modeled by lines) have better asymptotic runtime behavior than algorithms that scale relatively poorly (modeled by parabolas). While the idea of modeling runtime with parabolas and lines is simple, it’s not mathematically rigorous. Let’s develop the idea of order of growth and formalize it with mathematics.

Suppose we have an algorithm with the following step counts.

OperationSymbolic Count
less-than <100N2 + 3N
greater-than >2N3 + 1
and &&5000
What do you expect will be the overall order of growth of the runtime for the algorithm?

N3 (cubic) since the majority of the runtime will be spent on greater-than operations (assuming a large input N). Adding on the less-than operations and the and operations doesn’t affect the overall cubic order of growth.

The “dominating” operation in the step count table is what ultimately determines the overall order of growth for large inputs.

Cost model
A representative operation that models the overall order of growth.

For example, the greater-than operation is a good cost model for the overall order of growth of the algorithm above. When considering order of growth analysis, we can also ignore lower order terms and multiplicative constants. By choosing a cost model, we already discard information about these less-significant factors.

Applying this order of growth analysis back to dup1 and dup2, we can make the following statements about their runtime.

  • The worst case order of growth of the runtime for dup1 is N2.
  • The worst case order of growth of the runtime for dup2 is N.
  • The best case order of growth of the runtime for dup1 and dup2 is constant.

Simplified Modeling Process

If we only want the simplified order of growth, rather than building the entire step count table, we can instead:

  1. Choose our cost model (representative operation).
  2. Figure out the order of growth for the count of the cost model by either:
    • Making an exact count and then discarding the unnecessary pieces
    • Or, using intuition/inspection to determine orders of growth. (Needs practice!)

In lecture, we’ll redo our analysis of dup1 using this Simplified Modeling Process. We’ll also introduce a mathematical notation called Big-Theta, Big-O, and Big-Omega to formally define this idea of order of growth.