CSE413 Notes for Wednesday, 4/23/25

I returned to our discussion of depth_of. Here is the code we ended up with at the end of the previous lecture:

        let depth_of(n, tree) =
            let rec helper(t, depth) =
                match t with
                | Empty                   -> -1
                | Node(root, left, right) ->
                    if root = n then depth
                    else if n < root then helper(left, depth + 1)
                    else helper(right, depth + 1)
            in helper(tree, 0)
This version returns a -1 when it doesn't find a value in the tree. Another option would be to raise an exception. So we could imagine that we have a precondition on the function that they shouldn't call it if the value isn't in the tree. That doesn't seem like a very friendly thing to do, though. Someone said they could call contains before calling depth_of, but that means they have to search the tree twice: once to see if it's there, and a second time to find its depth.

OCaml offers a good alternative. This is a good place to use what is known as an option type. An option is appropriate when the answer is "0 or 1 of" something. In other words, sometimes there is an answer and sometimes not. The technical term for this is a "one of" type (versus a tuple where you have to include all values, which is called an "each of" type).

This situation comes up often in programming and there are many different approaches for addressing it. For the Scanner class in Java if you try to read a value when you have reached the end of a file, it throws an exception. There are other reading operations in the Java class libraries that return -1 when there is no data to read. When you call get on a key that isn't in a map, it returns the value null.

What is the maximum value in an empty list? There isn't one. This is a similar case. What is the depth of a value n in the tree? Sometimes there is an answer and sometimes the answer is, "There isn't one."

The option type is defined as follows:

        type 'a option = None | Some of 'a 
Notice the use of 'a, which means that it is polymorphic (you can have an int option, string option, float option, etc). Using these two constructors, I tried to rewrite our definition:

        let depth_of(n, tree) =
            let rec helper(t, depth) =
                match t with
                | Empty                   -> None
                | Node(root, left, right) ->
                    if root = n then depth
                    else if n < root then helper(left, depth + 1)
                    else helper(right, depth + 1)
            in helper(tree, 0)
This produced an error. It's the same problem we had with the attempt to return -1. We have to be consistent. What is the type of the parameter depth? We are returning None for the empty tree, which means that the return type is an int option. But then it thinks that depth must be of type int option and you can't add 1 to an option. The fix is simple. We return Some(depth) instead of depth:

        let depth_of(n, tree) =
            let rec helper(t, depth) =
                match t with
                | Empty                   -> None
                | Node(root, left, right) ->
                    if root = n then Some(depth)
                    else if n < root then helper(left, depth + 1)
                    else helper(right, depth + 1)
            in helper(tree, 0)
Obviously we will sometimes want to turn an option result into an actual value. You can do so using the function Option.get. For example:

        # let result = depth_of(t, 40);;
        val result : int option = Some 8
        # let d = Option.get(result);;
        val d : int = 8
There are also functions Option.is_some and Option.is_none that can be used to test which kind of option you have. More often we find ourselves using pattern matching with the None and Some constructors.

As one final example I said that I wanted to write a function list_max that would return the maximum value in a list. Using the option constructors, we can define base cases for an empty list and a one-element list:

        let rec list_max(lst) =
            match lst with
            | []    -> None
            | [x]   -> Some(x)
            ...
It's tempting to return x in the one-element list case, but we got an error message when we did so because then OCaml would assume that x is an int option instead of an int. For the final case, we want to compute the max of the rest of the list only once, so we introduced a let expression:

        let rec list_max(lst) =
            match lst with
            | []    -> None
            | [x]   -> Some(x)
            | x::xs -> 
              let max = list_max(xs)
              in if x > Option.get(max) then Some(x)
                 else max
    
It might seem odd that we're calling Option.get in the final case without testing whether there is a value to get, but keep in mind that we know that xs has at least one value in that branch, so we know that we will get a value returned.

Then I asked people about a certain behavior that you see in databases and spreadsheets. Suppose that you have a column of midterm scores and you leave two cells empty because the students didn't take the exam. If you then compute the average of that column of numbers, are the two empty cells included? Everyone seemed to know that they are not included. So if you change those cells to have a 0, you get a different average.

I mentioned that I had heard an interesting talk by Anders Heljsberg, the designer of the C# programming language in which he described Microsoft's decision to include nullable types in C# 2.0. He said they did this to make it easy for C# programs to interoperate with SQL data (databases).

I said that in C# 2.0, you can declare a variant of int called int? that is a nullable type of int. I asked what that might look like and someone said it sounds like an int that can be set to null. That's exactly right. In version 2.0 of C# you can say things like:

        int? x;
        x = null;
Why would you want to do that? Anders said that Microsoft was doing this to allow C# code to more easily interoperate with SQL applications. Anyone who uses Excel should be familiar with this concept. There is a difference between leaving a data cell blank versus filling in a value like 0. I often have to yell at the TAs not to enter a 0 unless a student earned an actual score of 0. Otherwise our computations for values like minimum, median, and average will give misleading results.

So then I asked people if this reminded them of anything. Does Microsoft's "int?" correspond to anything in OCaml? Someone said it's an int option, which is right. OCaml's option type allows us to create nullable types. Of course, then you have to either use pattern matching or Option.get to pull a value back out of an option type, which is a bit of a pain. Microsoft has decided to have the compiler do a lot of the work for you to implicitly call the equivalent of Option.get and the equivalent of the Some constructor to "wrap" a value into an option.

One of the things I like about OCaml is that it is usually possible to write clean, simple code without having to redesign the language or to build such support into the compiler. The option type is easy to define in terms of standard OCaml:

        type 'a option = None | Some of 'a;
The get function is also easy to write using standard OCaml:

        let get(opt) = 
            match opt with
             | None       -> invalid_arg("option is None")
             | Some value -> value
Then I askd people to consider this code:

        let x = 3
        let f(n) = x * n
        f(8)
        let x = 5
        f(8)
The definition of function f refers to n, which is defined in the function itself (the parameter passed to it), but f also refers to a variable that is not defined in the function itself, the variable x. Variables that are not defined within a function's scope are referred to as free variables.

We found that the function uses the binding of x to 3 that exists when the function is defined. Changing the binding for x does not change the behavior of the function. My question is, how does that work? How does OCaml manage to figure that out?

The answer involves understanding two important concepts:

So in OCaml we really should think of function definitions as being a pair of things: some code to be evaluated when the function is called and an environment to use in executing that code. This pair has a name. We refer to this as the closure of a function.

Remember that functions can have free variables in them, as in our function f that refers to a variable x that is not defined inside the function. The idea of a closure is that we attach a context to the code in the body of the function to "close" all of these stray references.

We explored some examples to understand the difference between a let definition that fully evaluates the code included in the definition versus a function definition that delays evaluating the code used in the definition. For example, I included some expressions that included calls on printing functions to show that let definitions are fully evaluated.

        # let x = 3;;
        val x : int = 3
        # let y = print_endline("hello"); 2 * x;;
        hello
        val y : int = 6
        # let f1(n) = print_endline("hello"); 2 * x + n;;
        val f1 : int -> int = <fun>
        # f1(3);;
        hello
        - : int = 9
        # f1(10);;
        hello
        - : int = 16
For a let definition, the print is performed when you type in the definition (indicating that OCaml is evaluating the code at that time). For the function definition, the print happens only when the function is called, indicating that OCaml delayed evaluating the expression until individual calls are made.

I gave one more example of a function definition to really understand the implications of what it means to have a closure.

        # let a = [|17; 45; 9|];;
        val a : int array = [|17; 45; 9|]
        # let b = 100;;
        val b : int = 100
        # let f(c) = print_endline("hello"); a.(0) + b + c;;
        val f : int -> int = <fun>
        # f(8);;
        hello
        - : int = 125
We begin by defining an array called a whose 0 element has the value 17. We then define a variable b with the value 100. And then we define a function f that prints a line of output with "hello" and then returns the sum of the zero-element of a, b, and c. In this case, the free variables are a and b. The parameter c is defined within the function's scope. When we call it passing 8 to c, we get the result 125 (17 + 100 + 8).

We know that changing b will not change the behavior of the function because it keeps track of the environment that existed at the point that we defined it. But what about the array a? Because the array is mutable, we can change that value and that changes the behavior of the function:

        # b = 0;;
        - : bool = false
        # let b = 0;;
        val b : int = 0
        # a.(0) <- 10;;
        - : unit = ()
        # f(8);;
        hello
        - : int = 118
As expected, resetting b to 0 changed nothing. But resetting the zero-index element of a to 10 changed the computation. Now the function prints the line of output and returns 118 (10 + 100 + 8). This points out an important property of mutable state, that it can lead functions to change their behavior.


Stuart Reges
Last modified: Wed Apr 23 20:50:21 PDT 2025