CSE143 Notes for Monday, 5/10/21

In an earlier lecture I discussed a well-behaved sorting technique called merge sort. Merge sort is so well behaved that it is almost dull. Quicksort is a different sorting technique that is very popular and that has somewhat more interesting properties. Quicksort on averages runs about twice as fast as merge sort. At the same time, quicksort can fall apart and turn into a very slow O(n2) sort, so it's not guaranteed to be faster. Quicksort can theoretically be used for any kind of list, but we're going to explore how it works with an array.

Quicksort was invented by a computer scientist named Tony Hoare. One way to think of it is to consider what takes up all the time in merge sort. In merge sort, we split the list into two halves, sort each half, and then merge the two sorted lists together. So I began by asking, "Can we somehow avoid merging the two lists back together after we sort each half?" In other words, is there some property that the two halves might have that would allow us to just sort them and then be done?

Someone correctly pointed out that if we could guarantee that the values in the first part were all less than the values in the second part, then we wouldn't need to merge the results together. For example, when I sort a large pile of exams, I begin by splitting them into two piles: A-L and M-Z. That way, all that I have to do is to sort the A-L pile, sort the M-Z pile, and put the two piles together.

The key idea here is that there is some kind of threshold where the values in the first part are less than the threshold and the values in the second part are greater than the threshold ("M" in my example of alphabetizing). In quicksort this value is referred to as the pivot. We usually think of the pivot as being in the list, so a better way to describe this is that we want to split the list into two parts based on the pivot value:

        +-----------------+----------------+
        | values <= pivot | values > pivot |
        +-----------------+----------------+
Before we began working on the main code, I reviewed the idea of swapping two values. I introduced the following method that we can call whenever we want to swap two values in the array:

        private static void swap(int[] list, int index1, int index2) {
            int temp = list[index1];
            list[index1] = list[index2];
            list[index2] = temp;
        }
Then I turned to the task of partitioning the list using a pivot. The quicksort algorithm involves sorting not just the overall list, but also lots of sublists that are shorter. As a result, we want to write it with some parameters that indicate the part of the array that we want to work with. Let's call those values low and high. The idea is that we're asked to work with values within that range of indexes:

             low                      high
              |                        |
              V                        V
        ----+----+----+----+----+----+----+----
        ... |    |    |    |    |    |    | ...
        ----+----+----+----+----+----+----+----
The idea is to pick a pivot and then split the range of values into two sections based on the pivot. I said that we should write a method to do this, so we began with this header:

        private static int partition(int[] list, int low, int high) {
            ...
        }
The first thing we have to do is to pick a pivot. It would be easy to choose the first value as the pivot (list[low]), but that might not work well. Remember that the data might be in sorted order, in which case the first value is the worst possible choice for a pivot. Similarly picking the last value is often a bad choice. Picking the middle value is a common approach, but even that could turn out badly, especially if someone knew how our code worked and wanted to be malicious. They could construct an array where our algorithm would keep making bad choices of pivot values.

There is a fairly easy way to avoid all of these problems. We can pick a random value to serve as the pivot. It is still possible for us to end up with a series of bad choices, but it is highly unlikely. And by using randomness, we make sure that a malicious person could not construct a case on purpose that would make our code run slowly.

So we began by finding a random index between low and high inclusive:

        int spot = low + (int) ((high - low + 1) * Math.random());
I said that it's a good idea to move the pivot before you start partitioning to put it in a place where you know it's safe. No matter how the list is partitioned, you know that the pivot can be stored at the front of the list. So I wrote these two lines of code to switch the middle value to the front and to store it in a variable called "pivot":

        swap(list, low, spot);
        int pivot = list[low];
Then we have to complete the partitioning task. I mentioned that I was going to use a technique that is known as a loop invariant. This is a useful way to think about writing loops. I expressed the invariant as a picture. Initially we don't know anything about the sequence of values we are trying to partition (something I expressed in the picture with a double question mark):

        +--------+
        |   ??   |
        +--------+
and we're trying to get to a situation where we know that the list has been split into two parts: values less than or equal to the pivot and values greater than the pivot:

        +----------+---------+
        | <= pivot | > pivot |
        +----------+---------+
For the invariant, we want to merge these two pictures to capture the idea of an in-between state where we can start with the one and move towards the other. I did this by indicating that the list will have three partitions:

        +----------+------------+---------+
        | <= pivot |     ??     | > pivot |
        +----------+------------+---------+
I introduced two variables called index1 and index2 that will keep track of the first and last values included in the "??" region:

        +----------+------------+---------+
        | <= pivot |     ??     | > pivot |
        +----------+------------+---------+
                    ^          ^
                    |          |
                  index1     index2
One of the benefits of a diagram like this is that it can help us to write the code. For example, I began by asking what values we should use to initialize index1 and index2. Initially almost everything is in the "??" region, so we could initialize them to low and high. I said that we could do slightly better because we know something about one particular value. The pivot is at position low and that is part of the first partition, so we can initialize index1 to be one higher than low:

        int index1 = low + 1;
        int index2 = high;
Then I asked for the loop test we should use. We want the "??" region to disappear, so what would be a good relationship to have between index1 and index2? Someone said that it would be good if they were equal. If they were equal, that would mean that the "??" region has exactly one value in it. Even better would be if they cross so that index1 becomes larger than index2. In that case, the "??" region would be empty. We can use this as our test (while they haven't crossed):

        while (index1 <= index2) {
            ...
        }
Then I asked people to think about how to get closer to having the "??" region shrink. We went back to the picture and I asked people to think about the value that index1 is referring to:

        +----------+-+----------+---------+
        | <= pivot |?|   ??     | > pivot |
        +----------+-+----------+---------+
                    ^          ^
                    |          |
                  index1     index2
It's in the "??" region, which is why I've drawn it as a question mark. What would be a convenient value for it to have? Someone said it would be convenient if it were less than or equal to the pivot. That would mean it belongs in the first partition and all we have to do is increment index1:

        if (list[index1] < pivot)
            index1++;
But that might not be true. If it's not true, then think about the value that index2 is referring to:

        +----------+----------+-+---------+
        | <= pivot |     ??   |?| > pivot |
        +----------+----------+-+---------+
                    ^          ^
                    |          |
                  index1     index2
What would be a convenient value to find there? It would be convenient if it were greater than the pivot, in which case it belongs in the second partition and we can decrement index2:

        if (list[index1] <= pivot)
            index1++;
        else if (list[index2] > pivot)
            index2--;
What if neither of these things is true? In that case, we'd have a value at index1 that belongs in the second partition and a value at index2 that belongs in the first partition. In that case, we can simply swap the two values and both increment index1 and decrement index2:

        if (list[index1] <= pivot)
            index1++;
        else if (list[index2] > pivot)
            index2--;
        else {
            swap(list, index1, index2);
            index1++;
            index2--;
        }
One of these three cases will match each time through the loop and each one shrinks the "??" region. Therefore, we know we're done. The region has an integer length, so it can't shrink forever. Eventually it will become empty.

I tried to emphasize that the invariant picture helped us to write the loop initialization, the loop test and the loop body.

But we still had a few details left. First, I asked whether we could just end the method like this. If someone is trying to do a quicksort and they ask us to partition, is it enough that we pick a pivot and split the list into the two partitions? The answer is no. Remember that the list won't, in general, split eventually into two equal halves. So we have to let someone know where the first partition ends (where the dividing line ends up). Because index1 and index2 cross, index2 ends up being the index of the last value in the first partition, so we can end our method by returning index2:

        return index2;
Of course, this requires changing the method header to indicate that it returns an int instead of having a void return type.

And I included one last detail that turns out to be important when we get to the recursion. Remember that we put the pivot at the front of the first partition when we started the whole process. Once we've split the list into two partitions, we are in a position to say exactly where that value belongs in the list. It belongs at the end of the first partition. So just before returning from the method, we move the pivot into its correct position:

        swap(list, low, index2);
This completed the partition method. But we still needed to write the sorting code. Fortunately, once you've done the partitioning part, the sorting is fairly easy. We want to write it in such a way that we could sort not just the overall array but any sublist within the array, so we wrote a header that included a low and high parameter:

        public static void sort(int[] list, int low, int high) {
            ...
        }
I asked the classic question of, "What would be an easy list to sort?" As we saw with mergesort, you don't need to do anything if the list has a length less than one, so we can write the body of the method as follows:

        if (low < high) {
            ...
        }
We begin by partitioning the list using the method we just wrote. Remember that it returns the index of where the pivot is placed:

        if (low < high) {
            int pivotIndex = partition(list, low, high);
            ...
        }
The pivot will be in its correct spot at pivotIndex, but we still need to sort the values that come before and the values that come after. Because neither of these lists include the pivot, they are both guaranteed to be shorter than the original list. That means we can use recursive calls to sort them:

        if (low < high) {
            int pivotIndex = partition(list, low, high);
            sort(list, low, pivotIndex - 1);
            sort(list, pivotIndex + 1, high);
        }
And that completes the method. The tough code to write is the partitioning code. Once we have it done, this sorting code is fairly simple.


Stuart Reges
Last modified: Mon May 10 12:25:31 PDT 2021