Algorithm Analysis I
Complete the Reading Quiz by noon before lecture.
Table of contents
In the lecture on Stacks and Queues, we briefly reviewed big-O notation to compare and contrast the running time of different data structures. Much of this course relies on comparing and contrasting different data structures and their implementation details, which is why we need to develop a more rigorous foundation for evaluating a program’s execution cost. Execution cost can be broken down into two categories.
- Time complexity
- How much time does it take for your program to execute?
- Space complexity
- How much memory does your program require?
In this course, we’ll mainly be determining the time complexity of different algorithms, which is also known as running time (or runtime) analysis.
Characterizing Runtime
Suppose we’re trying to determine if a sorted array contains duplicate values. Here are two ways to solve the problem.
dup1
- Consider every pair, returning true if any match!
dup2
- Take advantage of the sorted nature of our array.
- We know that if there are duplicates, they must be next to each other.
- Compare neighbors: return true first time you see a match! If no more items, return false.
We can see that dup1
seems like it’s doing a lot more unnecessary, redundant work than dup2
. But how much more work? Ideally, we want our characterization to be simple and mathematically rigorous while also clearly demonstrating the superiority of dup2
over dup1
.
Counting Steps
One characterization of runtime is by counting steps, or the number of operations executed by a program.
- Look at your code and the various operations that it uses (i.e. assignments, incrementations, etc.).
- Count the number of times each operation is performed.
dup1
Let’s count the number of steps executed as a result of calling dup1
on an array of size N = 10000.
public static boolean dup1(int[] A) {
for (int i = 0; i < A.length; i += 1) {
for (int j = i + 1; j < A.length; j += 1) {
if (A[i] == A[j]) {
return true;
}
}
}
return false;
}
How many times is the operation i = 0 executed?
i = 0
is only initialized once at the beginning of the nested for
loops.
The analysis gets more complicated due to the if
statement. In the best case, the program could exit early if a duplicate is found near the beginning of the array. In the worst case, the program could continue until the return false
statement at the end if the array does not contain any duplicates.
What is the least and most number of times that the operation j = i + 1 is executed?
1 to 10000 times.
This process gets tedious very quickly. Double check that the counts in the table below match what you expect.
Operation | Count N = 10000 |
---|---|
less-than < | 2 to 50,015,001 |
increment += 1 | 0 to 50,005,000 |
equals to == | 1 to 49,995,000 |
array accesses | 2 to 99,990,000 |
Not only is computing these counts tedious, but it doesn’t tell us about how the algorithm scales as N, the size of the array, increases. Rather than setting N = 10000, we can instead determine the count in terms of N.
Operation | Count N = 10000 | Symbolic Count |
---|---|---|
i = 0 | 1 | 1 |
j = i + 1 | 1 to 10,000 | 1 to N |
less-than < | 2 to 50,015,001 | 2 to (N2 + 3N + 2) / 2 |
increment += 1 | 0 to 50,005,000 | 0 to (N2 + N) / 2 |
equals to == | 1 to 49,995,000 | 1 to (N2 - N) / 2 |
array accesses | 2 to 99,990,000 | 2 to N2 - N |
dup2
Try to come up with rough estimates for the symbolic and exact counts for at least one of the operations for dup2
, and check that the rest of the counts match what you expect.
Operation | Count N = 10000 | Symbolic Count |
---|---|---|
i = 0 | 1 | 1 |
less-than < | ||
increment += 1 | ||
equals to == | ||
array accesses |
public static boolean dup2(int[] A) {
for (int i = 0; i < A.length - 1; i += 1) {
if (A[i] == A[i + 1]) {
return true;
}
}
return false;
}
Solution for dup2
Operation | Count N = 10000 | Symbolic Count |
---|---|---|
i = 0 | 1 | 1 |
less-than < | 1 to 10000 | 1 to N |
increment += 1 | 0 to 9999 | 0 to N - 1 |
equals to == | 1 to 9999 | 1 to N - 1 |
array accesses | 2 to 19998 | 2 to 2N - 2 |
dup2
is better! But why?
- An answer
- It takes fewer operations to accomplish the same goal.
- Better answer
- Algorithm scales better in the worst case: (N2 + 3N + 2) / 2 vs. N.
- Even better answer
- Parabolas grow faster than lines.
While the even better answer the same idea as the better answer, it provides a more general geometric intuition. As the size of the array (N) grows, the parabolic N2-time algorithm will take much longer to execute than the linear N-time algorithm.