Link
Algorithm Analysis I
Runtime analysis as a process: comprehending programs, modeling the number of steps, and formulating an answer.
Kevin Lin, with thanks to many others.
1
Ask questions anonymously on Piazza. Look for the pinned Lecture Questions thread.

Feedback from the Reading Quiz
2

Comprehending. Understanding the implementation details of a program.
Modeling. Counting the number of steps in terms of N, the size of the input.
Case Analysis. How certain conditions affect the program execution.
Asymptotic Analysis. Describing what happens for very large N, as N→∞.
Formalizing. Summarizing the final result in precise English or math notation.
Runtime Analysis Process
3
boolean dup1(int[] A)
Consider every pair
Array contains a duplicate at front
Array contains no duplicate items
Constant time
Quadratic time
Θ(1)
Θ(N2)
Overall Asymptotic Runtime Bound
Worst case
Best case
The reading described the implementation details for dup1 and dup2 (Comprehension) and introduced the idea of counting steps (Modeling). In this lecture, we will go in-depth on modeling and formalizing.

?: Where did case analysis come up in the reading?

Asymptotic Analysis
What happens for very large N, as N→∞.

Simulating billions of particles.
Social network with billions of users.
Logging billions of transactions.
Encoding billions of bytes of video data.

Linear-time algorithms scale better than quadratic-time algorithms (parabolas).
4
From this point forward, we’ll almost always be working in the mode of asymptotic analysis: considering the behavior of programs as N grows very large.

?: How can we characterize the range of step counts that we saw in dup1 and dup2?

5
Orders of Growth
Algorithm Design (Jon Kleinberg, Éva Tardos/Pearson Education)
?: Why might we choose to focus on very large N rather than small N?




?: How do multiplicative constants, e.g. 100N or N2 / 2, affect the order of growth of the runtime of different algorithms?

Asymptotic Analysis and Case Analysis
6
Operation
dup1: Quadratic/Parabolic
dup2: Linear
i = 0
1
1
less-than (<)
2 to (N2 + 3N + 2) / 2
0 to N
increment (+= 1)
0 to (N2 + N) / 2
0 to N - 1
equality (==)
1 to (N2 - N) / 2
1 to N - 1
array accesses
2 to N2 - N
2 to 2N - 2
For a very large array with billions of elements (i.e. asymptotic analysis), is it possible for dup1 to execute only 2 less-than (<) operations?
Q
public static boolean dup1(int[] A) {  
  for (int i = 0; i < A.length; i += 1) {
    for (int j = i + 1; j < A.length; j += 1) {
      if (A[i] == A[j]) {
         return true;
      }
    }
  }
  return false;
}

Q1: For a very large array with billions of elements (i.e. asymptotic analysis), is it possible for dup1 to execute only 2 less-than (<) operations?




?: What does the runtime for dup1 vs. dup2 look like if we only consider the best case asymptotic analysis? How does that result compare to the worst case asymptotic analysis?

For a very large array, is it possible for dup1 to execute only 2 less-than (<) operations?
7

Duplicate Finding
8
Operation
dup1: Quadratic/Parabolic
dup2: Linear
i = 0
1
1
less-than (<)
2 to (N2 + 3N + 2) / 2
0 to N
increment (+= 1)
0 to (N2 + N) / 2
0 to N - 1
equality (==)
1 to (N2 - N) / 2
1 to N - 1
array accesses
2 to N2 - N
2 to 2N - 2
Our goal is to somehow characterize the runtimes of the functions below.
Characterization should be simple and mathematically rigorous.
Characterization should demonstrate superiority of dup2 over dup1.

Worst Case Order of Growth
9
If we only need a worst case order of growth runtime analysis (typical for most of the practical runtime analysis in this course), then we can significantly simplify the Modeling process. In exchange for removing detail from our model, we get a quick way to characterize a program’s asymptotic behavior.

Simplification 1: Consider Only the Worst Case
10
Operation
Worst Case: dup1
i = 0
1
j = i + 1
1 to 10000
less-than (<)
2 to (N2 + 3N + 2) / 2
increment (+= 1)
0 to (N2 + N) / 2
equality (==)
1 to (N2 - N) / 2
array accesses
2 to N2 - N
Provides a runtime guarantee for any input of size N.
We often only care about worst case, but there are many exceptions.
public static boolean dup1(int[] A) {
  for (int i = 0; i < A.length; i += 1) {
    for (int j = i + 1; j < A.length; j += 1) {
      if (A[i] == A[j]) {
         return true;
      }
    }
  }
  return false;
}
?: What are real-world scenarios where the worst-case runtime is important? Scenarios where it isn’t important?

Identifying Orders of Growth
Consider the algorithm step counts below.
What do you expect will be the order of growth of the runtime for the algorithm?

N	[linear]
N2	[quadratic]
N3	[cubic]
N6	[sextic]
11
Q
Operation
Count
less-than (<)
100N2 + 3N
greater-than (>)
2N3 + 1
and (&&)
5,000
Q1: What do you expect will be the order of growth of the runtime for the algorithm? In other words, if we plotted total runtime vs. N, which curve would we expect?

What do you expect will be the order of growth?
12

Simplification 2: Restrict Attention to One Operation
Pick some representative operation to act as a proxy for the overall runtime.
Good choices:
Less-than (<)
Increment (+= 1)
Equality (==)
Array accesses
Poor choices: assignment of i = 0 or j = i + 1.
We call our choice the cost model.
13
Operation
Worst Case: dup1
i = 0
1
j = i + 1
10000
less-than (<)
(N2 + 3N + 2) / 2
increment (+= 1)
(N2 + N) / 2
equality (==)
(N2 - N) / 2
array accesses
N2 - N
Note that this assumes that the runtime cost of each instruction is the about the same, so 100 array accesses takes the same amount of time as 100 increment operations. This is a lie: in modern computers, a single array access can take the same amount of time as 100 or more increment operations.

?: What part of worst case order of growth analysis makes it acceptable to rely this lie?




?: How do we pick a representative operation? What are the qualities of good choices vs. poor choices?

Simplification 3: Eliminate Lower-Order Terms
Ignore lower-order terms.
14
Operation
Worst Case: dup1
increment (+= 1)
(N2 + N) / 2
public static boolean dup1(int[] A) {
  for (int i = 0; i < A.length; i += 1) {
    for (int j = i + 1; j < A.length; j += 1) {
      if (A[i] == A[j]) {
         return true;
      }
    }
  }
  return false;
}
?: Why can we ignore lower-order terms?

Simplification 4: Eliminate Multiplicative Constants
Ignore multiplicative constants.
We already threw away the meaningful constant when we chose a single proxy operation.
15
Operation
Worst Case: dup1
increment (+= 1)
N2 / 2
public static boolean dup1(int[] A) {
  for (int i = 0; i < A.length; i += 1) {
    for (int j = i + 1; j < A.length; j += 1) {
      if (A[i] == A[j]) {
         return true;
      }
    }
  }
  return false;
}

Simplification Summary	
Only consider the worst case.
Pick a representative operation (cost model).
Ignore lower order terms.
Ignore multiplicative constants.
16
Operation
dup1
i = 0
1
j = i + 1
1 to 10000
less-than (<)
2 to (N2 + 3N + 2) / 2
increment (+= 1)
0 to (N2 + N) / 2
equality (==)
1 to (N2 - N) / 2
array accesses
2 to N2 - N
Order of growth
Operation
Worst Case Growth
increment (+= 1)
N2
“The worst case order of growth of the runtime for dup1 is N2.”

Your Turn: Worst Case Order of Growth for dup2
Only consider the worst case.
Pick a representative operation (cost model).
Ignore lower order terms.
Ignore multiplicative constants.
17
Operation
dup2
i = 0
1
less-than (<)
0 to N
increment (+= 1)
0 to N - 1
equality (==)
1 to N - 1
array accesses
2 to 2N - 2
Order of growth
Operation
Worst Case Growth
array accesses
N
Operation
Worst Case Growth
“The worst case order of growth of the runtime for dup2 is …”
Q
Q1: Determine the worst case order of growth for dup2.




Q2: Which operations are appropriate cost models? How do you know?

Simplified Modeling Process
Rather than building the entire table, we can instead:
Choose a representative operation to count (cost model).
Figure out the order of growth for the count of the representative operation by either:
Making an exact count and then discarding the unnecessary pieces.
After lots of practice, using inspection to determine order of growth.

Let’s redo our analysis of dup1 with this new process.
This time, we’ll show all our work.
18
By using our simplifications from the outset, we can avoid building the table at all!

Worst Case Order of Growth: Exact Count of == Operations
19
int N = A.length; // N == 6
for (int i = 0; i < N; i += 1)
  for (int j = i + 1; j < N; j += 1)
    if (A[i] == A[j])
      return true;
return false;
==
==
==
==
==
==
==
==
==
==
==
==
==
==
==
0
1
2
3
4
5
0   1   2   3   4   5
i
j
“The worst case order of growth of the runtime for dup1 is N2.”

Worst Case Order of Growth: Geometric Argument
20
int N = A.length; // N == 6
for (int i = 0; i < N; i += 1)
  for (int j = i + 1; j < N; j += 1)
    if (A[i] == A[j])
      return true;
return false;
==
==
==
==
==
==
==
==
==
==
==
==
==
==
==
0
1
2
3
4
5
0   1   2   3   4   5
i
j
“The worst case order of growth of the runtime for dup1 is N2.”
Area of right triangle of side length N - 1.
Order of growth of area is N2.

Comprehending. Understanding the implementation details of a program.
Modeling. Counting the number of steps in terms of N, the size of the input.
Case Analysis. How certain conditions affect the program execution.
Asymptotic Analysis. Describing what happens for very large N, as N→∞.
Formalizing. Summarizing the final result in precise English or math notation.
Runtime Analysis Process
21
boolean dup1(int[] A)
Consider every pair
Array contains a duplicate at front
Array contains no duplicate items
Constant time
Quadratic time
Θ(1)
Θ(N2)
Overall Asymptotic Runtime Bound
“The worst case order of growth of the runtime for dup1 is N2.”
Worst case
Best case
We’ve just figured out how to model the worst case order of growth in several different ways.

Comprehending. Understanding the implementation details of a program.
Modeling. Counting the number of steps in terms of N, the size of the input.
Case Analysis. How certain conditions affect the program execution.
Asymptotic Analysis. Describing what happens for very large N, as N→∞.
Formalizing. Summarizing the final result in precise English or math notation.
Runtime Analysis Process
22
boolean dup1(int[] A)
Consider every pair
Array contains a duplicate at front
Array contains no duplicate items
Constant time
Quadratic time
Θ(1)
Θ(N2)
Overall Asymptotic Runtime Bound
“The worst case order of growth of the runtime for dup1 is N2.”
Worst case
Best case
Now, we want to formalize the order of growth precisely. The math might seem daunting at first but the idea is exactly the same as the order of growth analysis. Using Big-Theta instead of order of growth does not change the way we analyze algorithms at all.

Order of Growth Exercise
Informally, what is the shape of each function for very large N?
In other words, what is the order of growth of each function?
23
Q
Function
Order of Growth
N3 + 3N4
N4
(1 / N) + N3
N3
(1 / N) + 5
1
NeN + N
NeN
40 sin(N) + 4N2
N2
Function
Order of Growth
N3 + 3N4
(1 / N) + N3
(1 / N) + 5
NeN + N
40 sin(N) + 4N2
Q1: Informally, what is the shape of each function for very large N? In other words, what is the order of growth of each function?

Big-Theta Notation
Suppose we have a function R(N) with order of growth f(N). In Big-Theta, we write this as:
24
Function
Big-Theta
N3 + 3N4
Θ(N4)
(1 / N) + N3
Θ(N3)
(1 / N) + 5
Θ(1)
NeN + N
Θ(NeN)
40 sin(N) + 4N2
Θ(N2)
The ∈ symbol reads in English as “element of” so you could read the statement as “R(N) is an element of the set of functions with order of growth f(N).”

Big-Theta Definition


means there exist positive constants k1 and k2 such that


for all values of N greater than some N0.
25
“Very large N”
?: What is a value that we can choose for N0 according to the plot on the right?

Big-Theta Challenge
Find a simple f(N) and corresponding k1 and k2.
26
Q
Demo


means there exist positive constants k1 and k2 such that


for all values of N greater than some N0.
Q1: Find a simple f(N) and corresponding k1 and k2.

Big-O Notation
Whereas Big-Theta can informally be thought of as something like “equals”, Big-O can be thought of as “less than or equal”.
All of the following are true.
27

Big-O Definition


means there exists a positive constant k2 such that


for all values of N greater than some N0.
28
“Very large N”
?: Why can we say that 40 sin(N) + 4N2 is in O(N4)? Explain in terms of the formal definition of Big-O.




?: Why is it incorrect to say that 40 sin(N) + 4N2 is in Θ(N4)? Explain in terms of the formal definition of Big-Theta.

Big-Omega Definition


means there exists a positive constant k1 such that


for all values of N greater than some N0.
29
“Very large N”
Likewise, we have a Big-Omega definition for the other half of the inequality.

?: Describe 40 sin(N) + 4N2 ∈ Ω(N) in your own words using the plot on the right.




?: Does Θ(f(N)) imply O(f(N)) and Ω(f(N))? Does O(f(N)) and Ω(f(N)) imply Θ(f(N))?

Comprehending. Understanding the implementation details of a program.
Modeling. Counting the number of steps in terms of N, the size of the input.
Case Analysis. How certain conditions affect the number of executed steps.
Asymptotic Analysis. Describing what happens for very large N, as N→∞.
Formalizing. Summarizing the final result in precise English or math notation.
Runtime Analysis Process
30
boolean dup1(int[] A)
Consider every pair
Array contains a duplicate at front
Array contains no duplicate items
Constant time
Quadratic time
Θ(1)
Θ(N2)
Overall Asymptotic Runtime Bound
“The overall order of growth of the runtime for dup1 is …”
Worst case
Best case
Previously, we focused on stating formally, “The worst case order of growth of the runtime for dup1 is N2.”

Now, let’s figure out how we can put it all together and state, “The overall order of growth of the runtime for dup1 is …”

Asymptotic Analysis and Case Analysis
31
Operation
dup1: Quadratic/Parabolic
dup2: Linear
i = 0
1
1
less-than (<)
2 to (N2 + 3N + 2) / 2
0 to N
increment (+= 1)
0 to (N2 + N) / 2
0 to N - 1
equality (==)
1 to (N2 - N) / 2
1 to N - 1
array accesses
2 to N2 - N
2 to 2N - 2
For a very large array with billions of elements (i.e. asymptotic analysis), is it possible for dup1 to execute only 2 less-than (<) operations?
Recall that this is true: in the best case, there is a duplicate at the beginning of the very large array. A very large N (asymptotic analysis) doesn’t tell us anything about the contents of the array (case analysis).

Overall Asymptotic Runtime Bound for dup1
Give an overall asymptotic runtime bound for R as a combination of Θ, O, and/or Ω notation. Take into account both the best and the worst case runtimes (Rbest and Rworst).
32
Q
Demo
Q1: Give an overall asymptotic runtime bound for R as a combination of Θ, O, and/or Ω notation. Take into account both the best and the worst case runtimes (Rbest and Rworst).