Link

Algorithm Analysis

Complete the Reading Quiz by 3:00pm before lecture.

Table of contents

  1. Motivation: Characterizing Runtime
  2. One Way to Describe “Work”: Counting Steps
    1. dup1
    2. dup2

In the lecture on Stacks and Queues, we briefly introduced some concepts for comparing and contrasting the running time of six data structures. Much of this course relies on comparing and contrasting different data structures and their implementation details, so we need to develop a more rigorous foundation for evaluating a program’s execution cost. Execution cost can be broken down into two categories:

Time complexity
How much time does it take for your program to execute?
Space complexity
How much memory does your program require to execute?

In this course, we’ll mainly be determining the time complexity of different algorithms, which is also known as running time (or runtime) analysis. However, analyses of space complexity use the same concepts.

Motivation: Characterizing Runtime

Suppose we’re trying to determine if a sorted array contains duplicate values. Here are two ways to solve the problem:

dup1
Consider every possible pair, returning true if any match.
dup2
Take advantage of the sorted nature of our array.
  • We know that if there are duplicates, they must be next to each other.
  • Compare neighbors, returning true the first time you see a match.

Intuitively, it seems that dup1 seems like it’s doing a lot more unnecessary, redundant work than dup2; it ought to “take more time”. But how much more time, exactly? To answer that question, we need to find a way to express the “amount of work” that an algorithm performs. Ideally, we want this characterization to be simple and mathematically rigorous while also clearly demonstrating the superiority of dup2 over dup1.

One Way to Describe “Work”: Counting Steps

One characterization of runtime is by counting steps, or the number of operations executed by a program.

When we say “operation”, we’re talking about the basic actions a programming language is built on: adding two numbers together, assigning a value to a variable, and incrementing a variable are all examples of the type of operations we’re referring to. Don’t worry too much about the exact definition; as we’ll see, algorithmic analysis is most useful when it describes overall patterns in the code and not actual counts of individual operations.

dup1

Let’s count the number of steps executed as a result of calling dup1 on an array of size N = 10000.

public static boolean dup1(int[] A) {
  for (int i = 0; i < A.length; i += 1) {
    for (int j = i + 1; j < A.length; j += 1) {
      if (A[i] == A[j]) {
         return true;
      }
    }
  }
  return false;
}
How many times is the operation i = 0 executed?

i = 0 is only initialized once at the beginning of the nested for loops.

The analysis gets more complicated due to the if statement. In the best case, the program could exit early if a duplicate is found near the beginning of the array. In the worst case, the program could continue until the return false statement at the end if the array does not contain any duplicates.

What is the least and most number of times that the operation j = i + 1 is executed?

It may be executed as few as 1 time, and as many as 10,000 times.

Let’s try the same analysis with other operations performed by this algorithm. Double check that the counts in the table below match what you expect.

OperationNumber of executions (N = 10,000)
less-than <2 to 50,015,001
increment += 10 to 50,005,000
equals to ==1 to 49,995,000
array accesses2 to 99,990,000

Not only is computing these counts tedious, it also doesn’t tell us about the relationship between N (the size of the array) increases and the number of executions. In other words, it doesn’t tell us how the algorithnm scales. To see that relationship, we can instead express the count in terms of N:

OperationNumber of executions (N = 10,000)Symbolic expression
i = 011
j = i + 11 to 10,0001 to N
less-than <2 to 50,015,0012 to (N2 + 3N + 2) / 2
increment += 10 to 50,005,0000 to (N2 + N) / 2
equals to ==1 to 49,995,0001 to (N2 - N) / 2
array accesses2 to 99,990,0002 to N2 - N

dup2

Try to come up with rough estimates for the symbolic and exact counts for at least one of the operations for dup2, and check that the rest of the counts match what you expect.

public static boolean dup2(int[] A) {
  for (int i = 0; i < A.length - 1; i += 1) {
    if (A[i] == A[i + 1]) {
      return true;
    }
  }
  return false;
}
OperationNumber of executions (N = 10,000)Symbolic expression
i = 011
less-than <  
increment += 1  
equals to ==  
array accesses  
Solution for dup2
OperationNumber of executions (N = 10,000)Symbolic expression
i = 011
less-than <1 to 10,0001 to N
increment += 10 to 9,9990 to N - 1
equals to ==1 to 9,9991 to N - 1
array accesses2 to 19,9982 to 2N - 2

We know dup2 is better. But why?

An answer
In the worst case, dup2 takes fewer operations than dup1 to accomplish the same goal.
A better answer
In the worst case, dup2 scales better than dup1: (N2 + 3N + 2) / 2 vs N.
An even better answer
Parabolas grow faster than lines

When we express the number of operations for dup2 as a function of N, we see that it grows as a line; in contrast, expressing dup1 as a function of N would show a parabolic shape. The “better answer” describes the same idea as “the even better answer”, but is preferred because it is 1) more concise (it uses fewer words and fewer concepts) and is 2) more general (as the size of the array (N) grows, the parabolic N2-time algorithm will take much longer to execute than the linear N-time algorithm).

By characterizing the growth of dup2 as linear and the growth of dup1 as parabolic, we’ve met our stated goal of creating a characterization that is:

  • simple (for instance, we don’t care about the actual parabola formula)
  • mathematically rigorous (models the number of steps using mathematical functions)
  • clearly demonstrates the superiority of dup2 over dup1 (as N keeps growing, a parabola will grow much faster than a line)

Reading Quiz