Algorithm Analysis
Complete the Reading Quiz by 3:00pm before lecture.
Table of contents
In the lecture on Stacks and Queues, we briefly introduced some concepts for comparing and contrasting the running time of six data structures. Much of this course relies on comparing and contrasting different data structures and their implementation details, so we need to develop a more rigorous foundation for evaluating a program’s execution cost. Execution cost can be broken down into two categories:
- Time complexity
- How much time does it take for your program to execute?
- Space complexity
- How much memory does your program require to execute?
In this course, we’ll mainly be determining the time complexity of different algorithms, which is also known as running time (or runtime) analysis. However, analyses of space complexity use the same concepts.
Motivation: Characterizing Runtime
Suppose we’re trying to determine if a sorted array contains duplicate values. Here are two ways to solve the problem:
dup1
- Consider every possible pair, returning true if any match.
dup2
- Take advantage of the sorted nature of our array.
- We know that if there are duplicates, they must be next to each other.
- Compare neighbors, returning true the first time you see a match.
Intuitively, it seems that dup1
seems like it’s doing a lot more unnecessary, redundant work than dup2
; it ought to “take more time”. But how much more time, exactly? To answer that question, we need to find a way to express the “amount of work” that an algorithm performs. Ideally, we want this characterization to be simple and mathematically rigorous while also clearly demonstrating the superiority of dup2
over dup1
.
One Way to Describe “Work”: Counting Steps
One characterization of runtime is by counting steps, or the number of operations executed by a program.
When we say “operation”, we’re talking about the basic actions a programming language is built on: adding two numbers together, assigning a value to a variable, and incrementing a variable are all examples of the type of operations we’re referring to. Don’t worry too much about the exact definition; as we’ll see, algorithmic analysis is most useful when it describes overall patterns in the code and not actual counts of individual operations.
dup1
Let’s count the number of steps executed as a result of calling dup1
on an array of size N = 10000.
public static boolean dup1(int[] A) {
for (int i = 0; i < A.length; i += 1) {
for (int j = i + 1; j < A.length; j += 1) {
if (A[i] == A[j]) {
return true;
}
}
}
return false;
}
How many times is the operation i = 0 executed?
i = 0
is only initialized once at the beginning of the nested for
loops.
The analysis gets more complicated due to the if
statement. In the best case, the program could exit early if a duplicate is found near the beginning of the array. In the worst case, the program could continue until the return false
statement at the end if the array does not contain any duplicates.
What is the least and most number of times that the operation j = i + 1 is executed?
It may be executed as few as 1 time, and as many as 10,000 times.
Let’s try the same analysis with other operations performed by this algorithm. Double check that the counts in the table below match what you expect.
Operation | Number of executions (N = 10,000) |
---|---|
less-than < | 2 to 50,015,001 |
increment += 1 | 0 to 50,005,000 |
equals to == | 1 to 49,995,000 |
array accesses | 2 to 99,990,000 |
Not only is computing these counts tedious, it also doesn’t tell us about the relationship between N (the size of the array) increases and the number of executions. In other words, it doesn’t tell us how the algorithnm scales. To see that relationship, we can instead express the count in terms of N:
Operation | Number of executions (N = 10,000) | Symbolic expression |
---|---|---|
i = 0 | 1 | 1 |
j = i + 1 | 1 to 10,000 | 1 to N |
less-than < | 2 to 50,015,001 | 2 to (N2 + 3N + 2) / 2 |
increment += 1 | 0 to 50,005,000 | 0 to (N2 + N) / 2 |
equals to == | 1 to 49,995,000 | 1 to (N2 - N) / 2 |
array accesses | 2 to 99,990,000 | 2 to N2 - N |
dup2
Try to come up with rough estimates for the symbolic and exact counts for at least one of the operations for dup2
, and check that the rest of the counts match what you expect.
public static boolean dup2(int[] A) {
for (int i = 0; i < A.length - 1; i += 1) {
if (A[i] == A[i + 1]) {
return true;
}
}
return false;
}
Operation | Number of executions (N = 10,000) | Symbolic expression |
---|---|---|
i = 0 | 1 | 1 |
less-than < | ||
increment += 1 | ||
equals to == | ||
array accesses |
Solution for dup2
Operation | Number of executions (N = 10,000) | Symbolic expression |
---|---|---|
i = 0 | 1 | 1 |
less-than < | 1 to 10,000 | 1 to N |
increment += 1 | 0 to 9,999 | 0 to N - 1 |
equals to == | 1 to 9,999 | 1 to N - 1 |
array accesses | 2 to 19,998 | 2 to 2N - 2 |
We know dup2
is better. But why?
- An answer
- In the worst case,
dup2
takes fewer operations thandup1
to accomplish the same goal. - A better answer
- In the worst case,
dup2
scales better thandup1
: (N2 + 3N + 2) / 2 vs N. - An even better answer
- Parabolas grow faster than lines
When we express the number of operations for dup2
as a function of N, we see that it grows as a line; in contrast, expressing dup1
as a function of N would show a parabolic shape. The “better answer” describes the same idea as “the even better answer”, but is preferred because it is 1) more concise (it uses fewer words and fewer concepts) and is 2) more general (as the size of the array (N) grows, the parabolic N2-time algorithm will take much longer to execute than the linear N-time algorithm).
By characterizing the growth of dup2
as linear and the growth of dup1
as parabolic, we’ve met our stated goal of creating a characterization that is:
- simple (for instance, we don’t care about the actual parabola formula)
- mathematically rigorous (models the number of steps using mathematical functions)
- clearly demonstrates the superiority of
dup2
overdup1
(as N keeps growing, a parabola will grow much faster than a line)