CSE341 Notes for Friday, 4/3/09

I began by mentioning that from a historical point of view, ML was the first programming language to have what is known as parametric polymorphism. When we ask ML about functions like hd or tl, we get strange notations that involve "'a":

        - hd;
        val it = fn : 'a list -> 'a
        - tl;
        val it = fn : 'a list -> 'a list

The more modern term for this is generics It is similar to the way that we define an ArrayList<E> in Java. The "E" is a type parameter and indicates that we can make many different kinds of ArrayList objects (ArrayList<String>, ArrayList<Integer>, etc). In a similar manner, ML is letting you know that the hd and tl functions can act on many different kinds of lists (string list, int list, etc).

I then spent 20 minutes discussing the concept of mutable state. We try to avoid mutable state in functional programming. This will seem odd at first, because it is such a central technique in procedural programming that it's difficult to imagine how you can program without it. For example, in a language like Java we often have an integer variable n that we will increment by saying something like n++. This is a very typical example of mutable state. We have a memory location for the variable x that we change (or mutate) over time.

At first glance, ML variables appear to have the same ability. After all, we can say:

        val x = 3;
        val x = x + 1;

But this has a very different effect in ML. Ullman gives a careful discussion in the book and I encouraged people to pay attention to those sections of the book because they will help you to understand these distinctions. You can think of an ML program as a sequence of bindings that are stored in an environment. The code above introduces two different bindings for the variable x. The second binding makes use of the value from the first binding, but this is very different from sharing a single memory location that different code segments can all refer to. I said that this distinction is difficult to understand at first, but it turns out to be very important.

I also gave an example from Java. Joshua Bloch, the architect of the Java class libraries, has written a book called Effective Java in which he gives a series of "tips" for programming well in Java. Item 13 is to "Favor immutability." For example, String objects are immutable in Java. I asked people what that means and whether it's a good thing.

What it means is that once a String object is constructed, it can never be changed. At first, this seems like a dangerous and inefficient decision. For example, this loop generates 1001 different String objects just to put together a String of 1000 stars:

        String s = "";
        for (int i = 0; i < 1000; i++)
            s += "*";

Someone pointed out that Java has alternatives called StringBuffer and StringBuilder that don't have this inefficiency. But why do this with String?

Someone mentioned that it can be problematic to have two different variables pointing to the same object if they have the ability to change that object. The two different variables can interfere with each other. For example, we're all used to writing Java constructors that take String objects as parameters:

        public Employee(String name, ...) {
            this.name = name;
            ....
        }

This is a potentially dangerous operation. Someone can give you a String that you remember as the name of the employee and they can turn around and modify the String. For example, I might try to trick a program into thinking that I'm Bill Gates and later change the String to my name instead (e.g., having it send a big paycheck to me instead of Bill). If Strings were mutable, we'd find ourselves wanting to make a defensive copy in a constructor like this. That's what we tend to do in languages like C++.

This doesn't even have to be malicious. For example, someone might be using a mutable single String to create a series of different Employee objects. So they change the value of the name string and pass the new name. If you just store a reference to that String, then you end up with a series of Employee objects that all have the same name.

But there are even further problems. We're also used to writing code like this in Java:

        public String getName() {
            return name;
        }

This is another place where you'd want to make a copy of the String if it was mutable because otherwise you're allowing the person who calls your method to have access to your mutable String. They might maliciously or accidentally damage the String.

Joshua Bloch's item 24 is to "Make defensive copies when needed" to solve exactly these kind of problems. We don't have to worry about that for type String because it is immutable. So this is at least one example of where immutability is helpful. In general, immutability eliminates many potential programming problems.

As another example, I asked people to consider a function f that returns an int and I asked under what circumstances we can replace this expression:

        f(x) + f(x)

with this expression:

        2 * f(x)

Someone said you can do the replacement if "f doesn't do anything else." That's a good way to look at it. Another phrase that is used for this is that we say that f has no side effects.

What kind of side effects might it have? It might change the value of a global variable that is used in the computation. For example, it might do something like this:

        globalCount++;
        return globalCount * x;

In that case, the second call on the function will return a different value than the first call because the computation depends on a variable whose value has changed. This is an example of the problems introduced by mutable state. If you use the simple mechanisms in ML, you won't get this kind of interference. There is no way to use simple variable binding, for example, to effect a global variable like this. ML does have some language elements that allow you to do this (references and arrays), but those are considered the "bad" mutable part of ML (the bad part of town).

Another case where this would make a difference is if the function produces output. For example, in Java if it called System.out.println, then you'll get different behavior by calling it twice versus calling it once. This is another kind of side effect and we'd call it another example of mutable state (changing the state of the output stream). ML has functions for reading and writing and they, also, are considered aspects of ML that detract from the purely functional approach.

There is a technical term for the ability to replace the sum of the two function calls with 2 times a single call. It's called referential transparency.

I then introduced two new ML concepts. First we looked at how a let construct can be used to create a local set of bindings that aren't introduced into the overall environment. It's similar to the idea of local variables in Java that appear inside of a block (i.e., inside a set of curly braces {}). For example, we might say:

        let val x = 2.3488942323 in x * x * x * x end;

This is a convenient way to give the name x to the numeric value we want to use in this expression while also keeping that name local to just this expression. It simplifies the expression without introducing a binding for x in the global environment. The general form of the let construct is:

let <binding1> <binding2> ... <binding-n> in <expression> end We could read it as, "Let the following bindings hold in evaluating this expression." You can have more than one binding without using any semicolons in the middle, as in:

        let val x = 2.2423 val y = 78.238 in x * y - 2.4 * x + 3.7 * y end;

You can include function bindings as well as variable bindings in a let construct. Often we use a let to define a helper function for some other function.

I then asked people how we could write a function that would return a list that counts from 1 up to some n. For example, if we call countTo(10), we want to get back the list [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. If we didn't mind having it count backwards, we could write it as:

        fun countTo(n) =
            if n = 1 then [1]
            else n::countTo(n - 1)

Someone suggested that we could use the append operator to fix the order:

        fun countTo(n) =
            if n = 1 then [1]
            else countTo(n - 1) @ [n]

This worked, but I pointed out that it is inefficient. It would end up making lots of extra copies of the list. I'll explain in a later lecture why that is so.

I asked if anyone could do it using the :: operator. Someone said we could do it with a helper function that takes two parameters. This is an excellent idea. We often solve problems in a functional language by introducing a helper function that has extra parameters. So we came up with this solution:

        fun count(low, high) =
            if low = high then [low]
            else low::count(low + 1, high)
        
        fun countTo(n) = count(1, n)

We can make this better by including the helper function as a local function inside of countTo using a let. The let construct takes any sequence of bindings, including both val and fun bindings. So we can rewrite this as:

        fun countTo(n) =
            let fun count(low, high) =
                    if low = high then [low]
                    else low::count(low + 1, high)
            in count(1, n)
            end

But we can do even better. Think about the parameter called "high" in the helper function. We pass it the value n and we keep passing it that value n. This is a rather silly thing to do. We can instead take advantage of the fact that the parameter n will be bound before we evaluate the let construct. As a result, we can refer to n inside the helper function instead of having the second parameter:

        fun countTo(n) =
            let fun count(low) =
                    if low = n then [low]
                    else low::count(low + 1)
            in count(1)
            end

The ability to refer to parameters in the outer function allows us to simplify the parameter passing of the inner helper function. This is a great advantage of the nested function.

I then talked about ML's pattern matching facility. ML has the ability to match certain forms of expressions. For example, previously we bound a single variable to a tuple, as in

        val x = (3.4, "hello");

but ML allows you to define something that looks like a tuple on the left side with the actual tuple on the right:

        val (x, y) = (3.4, "hello");

This binds x to 3.4 and y to "hello". ML can even do this with lists. For example, if we say:

        val [x]  = [3];

ML will bind x to 3. In this case we get a warning about the matches not being exhaustive. The warning is more useful when we're writing functions. It's just letting us know that we aren't using every possible kind of list here. We can also match one-element lists:

        val x::xs = [1, 3, 5];

which binds x=1 and y=[3, 5]. Or we can bind a two-element list:

        val x::y::zs = [1, 3, 5];

which binds x=1, y=3, zs=[5].

Using pattern matching, we looked at how to write functions that specify their result through a series of cases. For example, we know that the Fibonacci sequence begins with the values 1, 1 and that each subsequent value is the sum of the previous two. We could write this with an if/else:

        fun fib(n) =
            if n = 1 orelse n = 2 then 1
            else fib(n - 1) + fib(n - 2);

but we can also write this as a series of three cases, each with a different pattern:

        fun fib(0) = 1 | fib(1) = 1 | fib(n) = fib(n - 1) + fib(n - 2);

Notice the two pipe or vertical bar characters ("|") that separate the three different cases. We usually write this with each case on a separate line and we line up the pipe characters with the keyword "fun", as in:

        fun fib(0) = 1
        |   fib(1) = 1
        |   fib(n) = fib(n - 1) + fib(n - 2);

We used this same approach to write the list length function with cases for an empty list versus a nonempty list:

        fun len([]) = 0
        |   len(x::xs) = 1 + len(xs);

Stuart Reges

Last modified: Sun Apr 5 14:18:36 PDT 2009