We assume that you have seen running time in your prerequisite coursework (such as cse123, cse373, or equivalent). In particular, we expect that you have previously given worst case running times of programs using big-oh notation. What we will be doing this quarter is not substantially different from that. This reading is not at all intending to re-teach or modify your understanding of this, but is mostly meant to offer a different (and more precise) perspective on what’s going on with those concepts, and why computer science has adopted them.
Your concept check will primarily be a review/refresh of those concepts. This means that you may be able to do the concept check without reading this, but I think this reading will still help you to understand what we’re doing going forward.
Throughout this course our primary (but not exclusive) focus will be on developing intentional processes for designing faster algorithms. Almost always, for the problems we try to solve, it will be easy to write an algorithm if we have no consideration for the speed of the algorithm. Usually we would be able to write a naive algorithm that simply tries out every possible option and then returns the correct one. While such an algorithm would likely be correct, it likely would not be efficient. Being able to tell when an algorithm is efficient and being able to define efficient algorithms both require insights that we will develop throughout the quarter.
To help us see why we might care so much about efficiency in addition to correctness we’ll compare two different sorting algorithms, both of them correct.
The first sorting algorithm we will look at is selection sort, which we discussed in class on Friday 9/26. As a reminder, selection sort will sort a list into descending order by repeatedly selecting the next largest element in the array. More precisely, for each index i of the array, we will swap the value at index i with the value at index j where index j contains the largest value between index i (inclusive) and the end of the array.
The second algorithm we will look at is permutation sort. This algorithm will sort the array by repeatedly producing (and checking) a different permutation of the array until it is in descending order.
We proved in class that selection sort is correct, so let’s quickly demonstrate that permutation sort is correct.
(meta lesson: not all proofs of correctness need to be clever!)
Even though both algorithms are correct, that does not mean that they are equally good. I implemented both algorithms in Java so that we could try running each one. I then timed how long each algorithm took to run on arrays of various sizes. The following table shows the results:
| length | Selection Sort | Permutation Sort |
|---|---|---|
| 1 | 2.3 ms | 7.1 ms |
| 2 | 2.6 ms | 10.0 ms |
| 3 | 2.9 ms | 14.1 ms |
| 4 | 3.0 ms | 20.5 ms |
| 5 | 3.2 ms | 104.6 ms |
| 6 | 3.7 ms | 581.5 ms |
| 7 | 3.8 ms | 1253.3 ms |
| 8 | 4.1 ms | 5576.2 ms |
| 9 | 4.3 ms | 23513.1 ms |
| 10 | 4.4 ms | 164403.8 ms |
| 11 | 4.8 ms | 854988.4 ms \approx 0.85 s |
| 12 | 5.4 ms | 8442984.5 ms \approx 8.4 s |
| 13 | 6.4 ms | 120.35 s |
| 14 | 6.6 ms | 1595.23 s \approx 26 minutes |
| 15 | 7.1 ms | I didn’t bother trying, but I think it would be over 6 hours |
Nathan (the author of this particular reading) is a pretty patient dude, so clearly there’s a problem if he couldn’t stand to wait for permutation sort to sort an array of length only 15. In practice we’d like to sort lists that are millions of items long in reasonable amounts of time, so selection sort is clearly better!
Perhaps it’s intuitive to you why selection sort is better. After all, selection sort tries to carefully construct a sorted list piece-by-piece, whereas permutation sort exhaustively tries every reordering of elements until it happens upon one that works. What we’d like, though, is a way where we could predict which algorithm might be more efficient without going through all this effort of timing them.
Timing algorithms with a stop watch to see how long they run is
called Benchmarking
. Benchmarking has a lot of value for
many applications such as: testing to see how well a program runs
on specific hardware, testing to see how well a program runs on
real-world input, and testing to see how different implementations
compare. In summary, benchmarking is most helpful in testing
code that has already been written. It is not at all
helpful for helping us to decide which algorithm to turn
into a program, or in identifying how to make an algorithm
faster.
When designing our algorithms, we’ll need a way to talk about running time that is predictive rather than measured. Specifically, we will define algorithm running time so that it has the following properties:
All of these properties will help us because they make running time a property of the algorithm itself in that it does not depend on how the algorithm is implemented or deployed. This allows us to evaluate the efficiency of algorithms before we ever write code!
To achieve all of these properties, we will define algorithm running time in the following way:
We say that the running time of algorithm A is a function f: \mathbb{N} \rightarrow \mathbb{N}
(i.e. a function mapping integers to integers) such that f(n) is the worst case number of
operations
the algorithm would perform for an input of size
n.
In other words, we express the running time of an algorithm
A as a function f. The input and outputs to that
function f will both be integers.
The input value to f will be the
size of the input given to the algorithm A. The output value from f will be the maximum (worst case)
number of operations
that A will perform among all inputs of the
given size.
This definition already pretty clearly gives us some of the properties that we’re after. For one, this definition does not depend on the speed of the computer that may run our algorithm. Faster computers just do each operation more quickly, they do not do fewer operations. Also, by making running time a function of the input size, it does not require us to pick a specific input in advance. It turns out this definition gives us ALL of the properties we want, but we need to do a bit more work first to see this.
operations
In this definition of running time, we said that we need to
count the number of operations.
However, we did not really
state what an operation
is. If you talked about running
time in a prior course (e.g. cse123 or cse373) you most likely
used the phrase primitive operation
or similar to describe
the thing that you’d count. From there, you typically said that a
primitive operation is something like arithmetic, array indexing,
variable assignment, etc. This choice is not wrong, and it will
pretty much always give you essentially the same result as what
we’ll do in this course, but we’re going to select something
different for our operations
in order to make our lives
easier.
Instead of counting all primitive operations for our algorithms, we’ll instead pick just a small number of steps of our algorithm and just count those. As long as we’re careful about what steps we pick, this should give us the same general answer as counting primitive operations, but with less work.
For example, when discussing sorting algorithms in CSE373 you
may have said that Selection Sort requires n^2 comparisons
. In this case, the
operation we’re counting for running time is comparisons
.
When doing this, we could ignore all arithmetic, array indexing,
etc. when identifying our running time.
There are generally a few ways that we could select our operation(s), but here are some things to consider when selecting which operation or operations to count:
In most cases, there will be a clear choice of one or a few operations that is best according to all three considerations above. There may be times, however, when different considerations may lead you to select different operations. We’ll just ignore that circumstance for now, and then bring it up again later when we see our first example. (But in case you can’t wait to see what we’ll do… typically in this situation we would give our audience a running time for each choice of operation so they could choose for themselves based on their application.)
Now that we have a definition of what a running time is, we need a way to compare algorithms by their running times to see which is faster. This means that we need to have some way of comparing functions to determine which is the larger function. To do this we will use a concept that you should have seen in your prerequisite courses - asymptotic notation. This section (and the follow-on concept check) are intended as a refresher of this concept.
Asymptotic notation is used to compare functions based on their
long term rate of growth. Put overly simplistically, when using
asymptotic notation, we will determine that the bigger function is
the one that has larger output for really big inputs. This means
that when comparing functions, we will completely ignore their
behavior on small
inputs.
In addition to ignoring small inputs, we’ll also ignore constant coefficients of the output. This is because of the peculiar way we selected the running time function for our algorithms. For example, if I only selected the most frequent operation when expressing my running time, but you counted all of the most frequent 3 operations, then it would be misleading to say that your algorithm took 3 times as long. Because it would get tedious to exactly communicate the exact operations that were counted, and because different computers might take different amounts of time to do the same operation, we’ll have better results by just ignoring constant coefficients so that we’d consider running times that are off by a factor of 3 (for example) to simply be the same running time.
For now, in this reading, we will only give high-level descriptions of our asymptotic notation. In class we’ll work with more precise definitions.