Algorithm Analysis
Table of contents
- Characterizing Runtime
- Counting Steps
- Why Scaling Matters
- Asymptotic Analysis
- Simplified Modeling Process
We can analyze algorithms in many different ways.
- Time complexity
- How much time does it take for your algorithm to execute?
- Space complexity
- How much memory does your algorithm require?
- Societal impact
- How does your algorithm affect the rest of the world?
We will investigate all of these different costs in this course, starting with time complexity. Time complexity analysis is also known as running time (or runtime) analysis.
Characterizing Runtime
Suppose we’re trying to determine if a sorted array contains duplicate values. Here are two ways to solve the problem.
dup1
- Consider every pair of items, returning true if any match!
dup2
- Take advantage of the sorted nature of our array.
- We know that if there are duplicates, they must be next to each other.
- Compare neighbors: return true first time you see a match! If no more items, return false.
We can see that dup1
seems like it’s doing a lot more unnecessary, redundant work than dup2
. But how much more work? Ideally, we want our characterization to be simple and mathematically rigorous while also clearly demonstrating the superiority of dup2
over dup1
.
Counting Steps
One characterization of runtime is by counting steps, or the number of operations executed by a program.
- Look at your code and the various operations that it uses (i.e. assignments, incrementations, etc.).
- Count the number of times each operation is performed.
dup1
Let’s count the number of steps executed as a result of calling dup1
on an array of size N = 10000.
public static boolean dup1(int[] A) {
for (int i = 0; i < A.length; i += 1) {
for (int j = i + 1; j < A.length; j += 1) {
if (A[i] == A[j]) {
return true;
}
}
}
return false;
}
How many times is the operation i = 0 executed?
i = 0
is only initialized once at the beginning of the nested for
loops.
The analysis gets more complicated due to the if
statement. In the best case, the program could exit early if a duplicate is found near the beginning of the array. In the worst case, the program could continue until the return false
statement at the end if the array does not contain any duplicates.
What is the least and most number of times that the operation j = i + 1 is executed?
1 to 10000 times.
This process gets tedious very quickly. Double check that the counts in the table below match what you expect.
Operation | Count N = 10000 |
---|---|
less-than < | 2 to 50,015,001 |
increment += 1 | 0 to 50,005,000 |
equals to == | 1 to 49,995,000 |
array accesses | 2 to 99,990,000 |
Not only is computing these counts tedious, but it doesn’t tell us about how the algorithm scales as N, the size of the array, increases. Rather than setting N = 10000, we can instead determine the count in terms of N.
Operation | Count N = 10000 | Symbolic Count |
---|---|---|
i = 0 | 1 | 1 |
j = i + 1 | 1 to 10,000 | 1 to N |
less-than < | 2 to 50,015,001 | 2 to (N2 + 3N + 2) / 2 |
increment += 1 | 0 to 50,005,000 | 0 to (N2 + N) / 2 |
equals to == | 1 to 49,995,000 | 1 to (N2 - N) / 2 |
array accesses | 2 to 99,990,000 | 2 to N2 - N |
dup2
Try to come up with rough estimates for the symbolic and exact counts for at least one of the operations for dup2
and check that the rest of the counts match what you expect.
Operation | Count N = 10000 | Symbolic Count |
---|---|---|
i = 0 | 1 | 1 |
less-than < | ||
increment += 1 | ||
equals to == | ||
array accesses |
public static boolean dup2(int[] A) {
for (int i = 0; i < A.length - 1; i += 1) {
if (A[i] == A[i + 1]) {
return true;
}
}
return false;
}
Solution for dup2
Operation | Count N = 10000 | Symbolic Count |
---|---|---|
i = 0 | 1 | 1 |
less-than < | 1 to 10000 | 1 to N |
increment += 1 | 0 to 9999 | 0 to N - 1 |
equals to == | 1 to 9999 | 1 to N - 1 |
array accesses | 2 to 19998 | 2 to 2N - 2 |
Why Scaling Matters
dup2
is better! But why?
- An answer
- It takes fewer operations to accomplish the same goal.
- Better answer
- Algorithm scales better in the worst case: (N2 + 3N + 2) / 2 vs. N.
The better answer provides the start of a mathematical argument for the superiority of dup2
over dup1
.
- Even better answer
- Parabolas grow faster than lines.
Computer scientists are interested in communicating ideas about algorithms. The even better answer here conveys a more general geometric intuition about the order of growth of the runtime for dup2
compared to dup1
. As the size of the array (N) grows, the parabolic N2-time algorithm will take much longer to execute than the linear N-time algorithm.
Asymptotic Analysis
The goal of time complexity analysis is to make an argument about the running time of an algorithm. In most cases, we only care about what happens for very large N (asymptotic behavior). We want to consider what types of algorithms would best handle big amounts of data, such as in the following examples.
- Simulation of billions of interacting particles
- Social network with billions of users
- Encoding billions of bytes of video data
Algorithms that scale well (modeled by lines) have better asymptotic runtime behavior than algorithms that scale relatively poorly (modeled by parabolas). While the idea of modeling runtime with parabolas and lines is simple, it’s not mathematically rigorous. Let’s develop the idea of order of growth and formalize it with mathematics.
Suppose we have an algorithm with the following step counts.
Operation | Symbolic Count |
---|---|
less-than < | 100N2 + 3N |
greater-than > | 2N3 + 1 |
and && | 5000 |
What do you expect will be the overall order of growth of the runtime for the algorithm?
N3 (cubic) since the majority of the runtime will be spent on greater-than operations (assuming a large input N). Adding on the less-than operations and the and operations doesn’t affect the overall cubic order of growth.
The “dominating” operation in the step count table is what ultimately determines the overall order of growth for large inputs.
- Cost model
- A representative operation that models the overall order of growth.
For example, the greater-than operation is a good cost model for the overall order of growth of the algorithm above. When considering order of growth analysis, we can also ignore lower order terms and multiplicative constants. By choosing a cost model, we already discard information about these less-significant factors.
Applying this order of growth analysis back to dup1
and dup2
, we can make the following statements about their runtime.
- The worst case order of growth of the runtime for
dup1
is N2. - The worst case order of growth of the runtime for
dup2
is N. - The best case order of growth of the runtime for
dup1
anddup2
is constant.
Simplified Modeling Process
If we only want the simplified order of growth, rather than building the entire step count table, we can instead:
- Choose our cost model (representative operation).
- Figure out the order of growth for the count of the cost model by either:
- Making an exact count and then discarding the unnecessary pieces
- Or, using intuition/inspection to determine orders of growth. (Needs practice!)
In lecture, we’ll redo our analysis of dup1
using this Simplified Modeling Process. We’ll also introduce a mathematical notation called Big-Theta, Big-O, and Big-Omega to formally define this idea of order of growth.