CSE341 Notes for Wednesday, 4/7/09

We left off with this version of the mergesort code:

        fun msort([]) = []
        |   msort(L) =
                let val (M, N) = split(L)
                    val list1 = msort(M)
                    val list2 = msort(N)
                in merge(list1, list2)
                end
I think it's better to express this in a more functional way using a single expression:

        fun msort([]) = []
        |   msort(L) =
                let val (M, N) = split(L)
        	in merge(msort(M), msort(N))
        	end
We tried running this version of the code and found that it didn't work. It went into infinite recursion. The problem is that eventually you will get down to a 1-element list and the split function returns a tuple with the same 1-element list along with an empty list. So we end up recursively sorting the same 1-element list over and over. For this function, we need a special case for the 1-element list:

        fun msort([]) = []
        fun msort([x]) = [x]
        |   msort(L) =
                let val (M, N) = split(L)
        	in merge(msort(M), msort(N))
        	end
This version worked fine.

Then we wrote a function called qsort that uses the quicksort algorithm to sort a list. Initially we limited ourselves to sorting lists of ints. We began with our usual question about an easy list to work with and someone said we should have a case for empty list, so we began with:

        fun qsort([]) = []
        |   qsort(x::xs) = ?
Then I asked if anyone remembered how quicksort works. Someone mentioned that the main steps are to:
  • Pick a value from the list that we refer to as the pivot.

  • Split the list into 2 parts: values less than the pivot and values greater than the pivot. I said that this step is often referred to as partitioning the list and the two parts are often referred to as partitions. I mentioned that we have to include the possibility of values equal to the pivot, although it doesn't matter which partition we put them into.

  • Quicksort the two partitions.

  • Put the pieces together.

    One nice thing about quicksort is that putting the pieces together isn't difficult. We know that the values in the first partition all come before the values in the second partition, so it's just a matter of gluing the pieces together.

    I said that we'd keep things simple by using the first value in the list as our pivot. This isn't an ideal choice, especially if the list is already sorted, but it will work well for the randomized lists we want to work with. So I changed the variable names in our code to reflect this choice:

            fun qsort([]) = []
            |   qsort(pivot::rest) = ?
    
    The first step in solving this is to partition the list, so I asked people how to implement this in ML and people seemed to be stumped. That's not surprising because we're just starting to learn ML and this is a nontrivial computation. I said that you don't have to abandon your programming instincts from procedural programming. So how would you solve it in a language like Java if you were asked to work with a linked list?

    By thinking through that, we developed the following pseudocode for partitioning the list:

             partition 1 = empty list
             partition 2 = empty list
             for (each value in list)
                if (value <= pivot)
                    move it to partition 1
                } else {
                    move it to partition 2
                }
              }
              finish up
    
    I said that you can convert this iterative solution to a recursive solution without much trouble. With a loop you use a set of local variables to keep track of the current state of your computation. Here there are three such local variables: the list that we are partitioning, the first partition, and the second partition. Local variables like these become parameters to a helper function. Our helper function is supposed to partition the list, so we decided to call it "partition". Since the loop involves three variables, two of which are initialized to be empty lists, we write partition as a helper function of three parameters and we include empty lists for two of the parameters in the initial call:

            fun qsort([]) = []
            |   qsort(pivot::rest) =
                    let fun partition(list, part1, part2) = ?
                    in partition(rest, [], [])
                    end;
    
    Notice how the call after the word "in" exactly parallels the situation before our loop begins executing. Our three state variables are the list of values to partition and two variables for storing the two partitions which are initially empty.

    In our pseudocode we continue until the overall list becomes empty, at which time we "finish up" the computation. We can include this as one of the cases for our helper function:

            fun qsort([]) = []
            |   qsort(pivot::rest) =
                    let fun partition([], part1, part2) = finish up
                        |   partition(list, part1, part2) = ?
                    in partition(rest, [], [])
                    end;
    
    Compare this initial case for partition versus the initial call for partition. We go from having these three values for our computation:

            (original list, [], [])
    
    to having this set of values for our computation:

            ([], partition 1, partition 2)
    
    In other words, we go from having all of the values stored in the original list and having two empty partitions to having an empty list of values and two partitions that have been filled in. Now we just have to describe how we go from one to the other. In our pseudocode, each time through the loop we handled one element of the original list, either moving it to partition 1 or moving it to partition 2. We can do the same thing with our helper function. So first we need to indicate that in the second case for partition, we want to process one value from the original list. We do so by replacing the "list" above with "x::xs":

            fun qsort([]) = []
            |   qsort(pivot::rest) =
                    let fun partition([], part1, part2) = finish up
                        |   partition(x::xs, part1, part2) = ?
                    in partition(rest, [], [])
                    end;
    
    We had an if/else in the loop and we can use an if/else here:

            fun qsort([]) = []
            |   qsort(pivot::rest) =
                    let fun partition([], part1, part2) = finish up
                        |   partition(x::xs, part1, part2) =
                                if x <= pivot then (x goes in 1st partition)
                                else (x goes in 2nd partition)
                    in partition(rest, [], [])
                    end;
    
    If xx belongs in the first partition, then we want to go from having this set of values:

            (x::xs, part1, part2)
    
    to having this set of values:

            (xs, x::part1, part2)
    
    In other words, we move x from the list of values to be processed into the first partition. In the loop, we'd then come around the loop for the next iteration. In a recursive solution, we simply make a recursive call with those new values:

            fun qsort([]) = []
            |   qsort(pivot::rest) =
                    let fun partition([], part1, part2) = finish up
                        |   partition(x::xs, part1, part2) =
                                if x <= pivot then partition(xs, x::part1, part2)
                                else (value goes in 2nd partition)
                    in partition(rest, [], [])
                    end;
    
    In the second case we do something similar, moving value into the second partition and using a recursive call to continue the computation:

            fun qsort([]) = []
            |   qsort(pivot::rest) =
                    let fun partition([], part1, part2) = finish up
                        |   partition(x::xs, part1, part2) =
                                if x <= pivot then partition(xs, x::part1, part2)
                                else partition(xs, part1, x::part2)
                    in partition(rest, [], [])
                    end;
    
    The only thing we had left to fill in was the "finish up" part. Remember that reaching that point in the code is like finishing the loop in our pseudocode. We started out by saying that we need to quicksort the two partitions. That needs to be part of what we do in the "finish up" part:

            fun qsort([]) = []
            |   qsort(pivot::rest) =
                    let fun partition([], part1, part2) = need to: qsort(part1), qsort(part2)
                        |   partition(x::xs, part1, part2) =
                                if x <= pivot then partition(xs, x::part1, part2)
                                else partition(xs, part1, x::part2)
                    in partition(rest, [], [])
                    end;
    
    The qsort function returns a list, so what do we do with the two sorted lists that come back from these recursive calls? We glue them together with append:

            fun qsort([]) = []
            |   qsort(pivot::rest) =
                    let fun partition([], part1, part2) = qsort(part1) @ qsort(part2)
                        |   partition(x::xs, part1, part2) =
                                if x <= pivot then partition(xs, x::part1, part2)
                                else partition(xs, part1, x::part2)
                    in partition(rest, [], [])
                    end;
    
    This is not quite right, though, because we haven't accounted for the pivot. We didn't add it to either of our partitions. That's, in general, a good thing, because it guarantees that each of the partitions we recursively pass to qsort will be smaller than the original list. But it means we have to manually place the pivot into the result. It belongs right in the middle of the two partitions. We could use a cons operator ("::") to indicate this, but I think it's a little clearer in this case to show that we are gluing three pieces together, one of which is the pivot itself:

            fun qsort([]) = []
            |   qsort(pivot::rest) =
                    let fun partition([], part1, part2) =
                                qsort(part1) @ [pivot] @ qsort(part2)
                        |   partition(x::xs, part1, part2) =
                                if x <= pivot then partition(xs, x::part1, part2)
                                else partition(xs, part1, x::part2)
                    in partition(rest, [], [])
                    end;
    
    At that point we were done. This is a working version of quicksort. I loaded it in the ML interpreter and we managed to sort some short lists. I had only a few minutes left at that point, so I turned to the new idea I wanted to introduce: higher-order functions. In ML, functions are first class data values just like ints, strings, reals, and lists. That means that you can pass functions as arguments to other functions. A function that takes another function as an argument is called a higher-order function.

    So I turned back to our qsort function. I modified the function so that it takes a comparison function as an argument and I replaced the "<=" comparison with a call on the function passed as an argument. This required several changes because each recursive call and each pattern had to be updated:

            fun qsort(f, []) = []
            |   qsort(f, pivot::rest) =
                    let fun partition([], part1, part2) =  
            		qsort(f, part1) @ [pivot] @ qsort(f, part2)
                        |   partition(x::xs, part1, part2) =
            		    if f(value, pivot) then partition(xs, x::part1, part2)
            		    else partition(xs, part1, x::part2)
            	in partition(rest, [], [])
            	end
    
    We were still able to do the original sorting by saying:

            qsort(op <=, test);
    
    But now we had the flexibility to sort it backwards:

            qsort(op >=, test);
    
    And to define our own comparison function that sorts on magnitude:

            fun lessMagnitude(x, y) = abs(x) < abs(y);
            qsort(lessMagnitude, test);
    
    So we can easily change the definition of ordering to have this sort in a different way.

    I then discussed a subtlety of polymorphism in ML. For example, we can write a function to swap the order of values in a tuple by saying:

            fun switch(a, b) = (b, a)
    
    When I typed that into the interpreter, it responded with this:

            val switch = fn : 'a * 'b -> 'b * 'a
    
    The 'a and 'b are generic type parameters, similar to the type parameters we use in Java for defining structures like ArrayList. Just as in Java the "E" is filled in with a specific type, in ML the 'a and 'b are filled in with specific types.

    You don't have to define functions polymorphically. We can, for example, say:

            fun switch(a:string, b:int) = (b, a)
    
    In this case ML responds with:

            val switch = fn : string * int -> int * string
    
    In general, though, we prefer to declare functions with polymorphism so that they can be applied to a wider range of values.

    I then asked people to consider this definition for the last function that is supposed to return the last value of a list:

            fun last(lst) =
                if lst = [hd(lst)] then hd(lst)
                else last(tl(lst))
    
    When I loaded this in the ML interpreter, I got a warning and a slightly different type notation:

            wed.sml:2.12 Warning: calling polyEqual
            val last = fn : ''a list -> ''a
    
    The warning is generated by line 2 (in fact, character 12 of line 2 is what the "2.12" means). That happens because we have written this function in such a way that it depends on recognizing the equality of two different expressions. Many types in ML can be compared for equality, but not all. For example, we got an error when we went into the interpreter and asked:

            - 3.8 = 3.8;
            stdIn:1.1-1.10 Error: operator and operand don't agree [equality type required]
              operator domain: ''Z * ''Z
              operand:         real * real
              in expression:
        3.8 = 3.8
    
    ML does not allow you to compare values of type real for equality. The reasoning is that floating point numbers are stored as approximations, not as exact representations, so you shouldn't use a strict equality operation.

    So the warning is letting us know that we have written the function in such a way that we can apply it only to lists of equality types. We would not be able to use it on a list of real values. ML indicates that with the double apostrophe on the generic type. Instead of 'a, ML describes it in turns of ''a.

    In general, you want write your functions so that they don't have this limitation. There is no reason that you can't write a the last function in such a way that it will be general. But sometimes you'll be writing a more specific kind of function where this limitation isn't a problem. In fact, in some cases you won't be able to avoid it because part of the work of the function is to compare values for equality.


    Stuart Reges
    Last modified: Thu Oct 8 11:25:12 PDT 2009