Algorithm Analysis
Runtime analysis as a process: comprehending programs, modeling the number of steps, and formulating an answer.
Kevin Lin, with thanks to many others.
1
Comprehending. Understanding the implementation details of a program.
Modeling. Counting the number of steps in terms of N, the size of the input.
Case Analysis. How certain conditions affect the program execution.
Asymptotic Analysis. Describing what happens for very large N, as N→∞.
Formalizing. Summarizing the final result in precise English or math notation.
Runtime Analysis Process
2
boolean dup1(int[] A)
Consider every pair
Array contains a duplicate at front
Array contains no duplicate items
Constant time
Quadratic time
Θ(1)
Θ(N2)
Overall Asymptotic Runtime Bound
Worst case
Best case
The reading described the implementation details for dup1 and dup2 (Comprehension) and introduced the idea of counting steps (Modeling). In this lecture, we will go in-depth on modeling and formalizing.
?: Where did case analysis come up in the reading?
Asymptotic Analysis
What happens for very large N, as N→∞.
Simulating billions of particles.
Social network with billions of users.
Logging billions of transactions.
Encoding billions of bytes of video data.
Linear-time algorithms scale better than quadratic-time algorithms (parabolas).
3
Algorithms (Robert Sedgewick, Kevin Wayne/Princeton)
From this point forward, we’ll almost always be working in the mode of asymptotic analysis: considering the behavior of programs as N grows very large.
4
Orders of Growth
Algorithm Design (Jon Kleinberg, Éva Tardos/Pearson Education)
?: How do multiplicative constants, e.g. 100N or N2 / 2, affect the order of growth of the runtime of different algorithms?
Asymptotic Analysis vs. Case Analysis
5
Operation
dup1: Quadratic/Parabolic
dup2: Linear
i = 0
1
1
less-than (<)
2 to (N2 + 3N + 2) / 2
1 to N
increment (+= 1)
0 to (N2 + N) / 2
0 to N - 1
equality (==)
1 to (N2 - N) / 2
1 to N - 1
array accesses
2 to N2 - N
2 to 2N - 2
For a very large array with billions of elements (asymptotic analysis), how is it possible for dup1 to execute only 2 less-than (<) operations?
Q
public static boolean dup1(int[] A) {
for (int i = 0; i < A.length; i += 1) {
for (int j = i + 1; j < A.length; j += 1) {
if (A[i] == A[j]) {
return true;
}
}
}
return false;
}
Q1: For a very large array with billions of elements (asymptotic analysis), how is it possible for dup1 to execute only 2 less-than (<) operations?
?: What does the runtime for dup1 vs. dup2 look like if we only consider the best case asymptotic analysis? How does that result compare to the worst case asymptotic analysis?
Asymptotic Analysis vs. Case Analysis
6
A
Comprehending. Understanding the implementation details of a program.
Modeling. Counting the number of steps in terms of N, the size of the input.
Case Analysis. How certain conditions affect the program execution.
Asymptotic Analysis. Describing what happens for very large N, as N→∞.
Formalizing. Summarizing the final result in precise English or math notation.
Runtime Analysis Process
7
boolean dup1(int[] A)
Consider every pair
Array contains a duplicate at front
Array contains no duplicate items
Constant time
Quadratic time
Θ(1)
Θ(N2)
Overall Asymptotic Runtime Bound
Worst case
Best case
Case analysis and asymptotic analysis are two equally important parts of algorithm analysis.
Simplified Modeling Process
Rather than building the entire table of operation counts, we can instead:
Choose our cost model (representative operation).
Figure out the order of growth for the count of the cost model by either:
Making an exact count and then discarding the unnecessary pieces.
Or, using intuition/inspection to determine order of growth. (Needs practice!)
Let’s redo our analysis of dup1 with this Simplified Modeling Process.
This time, we’ll show all our work.
8
Worst Case Order of Growth: Exact Count of == Operations
9
int N = A.length; // N == 6
for (int i = 0; i < N; i += 1)
for (int j = i + 1; j < N; j += 1)
if (A[i] == A[j])
return true;
return false;
==
==
==
==
==
==
==
==
==
==
==
==
==
==
==
0
1
2
3
4
5
0 1 2 3 4 5
i
j
“The worst case order of growth of the runtime for dup1 is N2.”
Worst Case Order of Growth: Geometric Argument
10
int N = A.length; // N == 6
for (int i = 0; i < N; i += 1)
for (int j = i + 1; j < N; j += 1)
if (A[i] == A[j])
return true;
return false;
==
==
==
==
==
==
==
==
==
==
==
==
==
==
==
0
1
2
3
4
5
0 1 2 3 4 5
i
j
“The worst case order of growth of the runtime for dup1 is N2.”
Area of right triangle of side length N - 1.
Order of growth of area is N2.
Comprehending. Understanding the implementation details of a program.
Modeling. Counting the number of steps in terms of N, the size of the input.
Case Analysis. How certain conditions affect the program execution.
Asymptotic Analysis. Describing what happens for very large N, as N→∞.
Formalizing. Summarizing the final result in precise English or math notation.
Runtime Analysis Process
11
boolean dup1(int[] A)
Consider every pair
Array contains a duplicate at front
Array contains no duplicate items
Constant time
Quadratic time
Θ(1)
Θ(N2)
Overall Asymptotic Runtime Bound
Worst case
Best case
“The worst case order of growth of the runtime for dup1 is N2.”
Now, we want to formalize the order of growth precisely. The math might seem daunting at first but the idea is exactly the same as the order of growth analysis. Using Big-Theta instead of order of growth does not change the way we analyze algorithms at all.
Order of Growth Exercise
What is the order of growth of each function?
(Informally, what is the shape of each function for very large N?)
12
Q
Function
Order of Growth
N3 + 3N4
N4
(1 / N) + N3
N3
(1 / N) + 5
1
NeN + N
NeN
40 sin(N) + 4N2
N2
Function
Order of Growth
N3 + 3N4
(1 / N) + N3
(1 / N) + 5
NeN + N
40 sin(N) + 4N2
Q1: What is the order of growth of each function? (Informally, what is the shape of each function for very large N?)
Big-Theta Notation
Suppose we have a function R(N) with order of growth f(N). In Big-Theta, we write this as:
13
Function
Big-Theta
N3 + 3N4
Θ(N4)
(1 / N) + N3
Θ(N3)
(1 / N) + 5
Θ(1)
NeN + N
Θ(NeN)
40 sin(N) + 4N2
Θ(N2)
The ∈ symbol reads in English as “element of” so you could read the statement as “R(N) is an element of the set of functions with order of growth f(N).”
Big-Theta Definition
means there exist positive constants k1 and k2 such that
for all values of N greater than some N0.
14
“Very large N”
?: What is a value that we can choose for N0 according to the plot on the right?
Big-Theta Challenge
Find a simple f(N) and corresponding k1 and k2.
15
Q
means there exist positive constants k1 and k2 such that
for all values of N greater than some N0.
Q1: Find a simple f(N) and corresponding k1 and k2.
Big-O Notation
Whereas Big-Theta can informally be thought of as something like “equals”, Big-O can be thought of as “less than or equal”.
All of the following are true.
16
Big-O Definition
means there exists a positive constant k2 such that
for all values of N greater than some N0.
17
“Very large N”
?: Why can we say that 40 sin(N) + 4N2 is in O(N4)? Explain in terms of the formal definition of Big-O.
?: Why is it incorrect to say that 40 sin(N) + 4N2 is in Θ(N4)? Explain in terms of the formal definition of Big-Theta.
Big-Omega Definition
means there exists a positive constant k1 such that
for all values of N greater than some N0.
18
“Very large N”
Likewise, we have a Big-Omega definition for the other half of the inequality.
?: Describe 40 sin(N) + 4N2 ∈ Ω(N) in your own words using the plot on the right.
?: Does Θ(f(N)) imply O(f(N)) and Ω(f(N))? Does O(f(N)) and Ω(f(N)) imply Θ(f(N))?
Overall Asymptotic Runtime Bound for dup1
Give an overall asymptotic runtime bound for R as a combination of Θ, O, and/or Ω notation. Take into account both the best and the worst case runtimes (Rbest and Rworst).
Then, give a few other valid runtime bounds for Rbest , Rworst , and R using asymptotic notation.
19
Q
Q1: Give an overall asymptotic runtime bound for R as a combination of Θ, O, and/or Ω notation. Take into account both the best and the worst case runtimes (Rbest and Rworst).
Q2: Then, give a few other valid runtime bounds for Rbest , Rworst , and R using asymptotic notation.
Tight Asymptotic Runtime Bounds for dup1
Best case: Ω(1) and O(1), therefore Θ(1).
Worst case: Ω(N2) and O(N2), therefore Θ(N2).
Overall: Ω(1) and O(N2).
Because the Ω and O bounds do not agree, Θ bound does not exist.
20
A
Valid but Less-Descriptive Runtime Bounds for dup1
Best case: Ω(1) and O(1), therefore Θ(1).
Worst case: Ω(N2) and O(N2), therefore Θ(N2).
Overall: Ω(1) and O(N2).
Because the Ω and O bounds do not agree, Θ bound does not exist.
21
A
Mystery
Give a tight asymptotic runtime bound for mystery as a function of N, the length of the array, in the best case, worst case, and overall.
22
Q
boolean mystery(int[] a, int target) {
int N = a.length;
for (int i = 0; i < N; i += 1)
if (a[i] == target)
return true;
return false;
}
Q1: Give a tight asymptotic runtime bound for mystery as a function of N, the length of the array, in the best case, worst case, and overall.
Mystery: Big-Ω, Big-O, and Big-Θ
Best case:
Worst case:
Overall:
23
A