CSE341 Notes for Friday, 4/12/24

We turned to a new topic: defining your own types. You can introduce a synonym for an existing type, as in:

        # type int2 = int * int;;
        type int2 = int * int

This just gives you an option to refer to the type using a simpler identifier, as in:

        # let sum(x : int2) = fst(x) + snd(x);;
        val sum : int2 -> int = <fun>

Then we defined a type that corresponds to an enumerated type in Java, C, and C++. Suppose that you want to keep track of various colors and you'd like to have meaningful names for them. This is easy to do in OCaml:

        type color = Red | Blue | Green

This definition introduces a new type called "color". We use the vertical bar or pipe character ("|") to separate different possibilities for the type. This type has three possible forms. OCaml refers to the three identifiers as constructors, even though in this case they are very simple and don't require any data. OCaml requires that type names start with a lowercase letter and constructors start with an uppercase letter. You can ask about the constructors in the interpreter:

        # Red;;
        - : color = Red

You can also write functions that use these identifiers, as in:

        let rgb(c) =
            match c with
            | Red   -> (255, 0, 0)
            | Green -> (0, 255, 0)
            | Blue  -> (0, 0, 255)

This function returns a tuple of integers that correspond to standard RGB sequences for a given color (three integers in the range of 0 to 255 that represent the red, blue, and green components of each).

I then turned to a more complex example. I said that I wanted to explore the definition of a binary search tree in OCaml. I asked people what binary trees look like and someone said that they can be empty or they have a node with left and right subtrees. This becomes the basis of our type definition:

        type int_tree = Empty | Node of int * int_tree * int_tree

The name of the type is int_tree. It has two different forms. The first form uses the constructor Empty and has no associated data. The second form uses the constructor Node and takes a triple composed of the data for this node (an int), the left subtree and the right subtree. Notice how the keyword "of" is used to separate the constructor from the data type description.

Given this definition, we could make an empty tree or a tree of one node simply by saying:

        # Empty;;
        - : int_tree = Empty
        # Node(38, Empty, Empty);;
        - : int_tree = Node (38, Empty, Empty)

Notice that we use parentheses to enclose the arguments to the Node constructor. The Node constructor is similar to a function but has a slightly different status, as we'll see. In particular, we can use constructors in patterns, which makes our function definitions much clearer.

For example, we wrote the following function to insert a value into a binary search tree of ints.

        let rec insert(value, tree) =
            match tree with
            | Empty                   -> Node(value, Empty, Empty)
            | Node(root, left, right) ->
                if (value <= root) then Node(root, insert(value, left), right)
                else Node(root, left, insert(value, right))

If we are asked to insert a value into an empty tree, we simply create a leaf node with the value. Otherwise, we compare the value against the root and either insert it into the left or right subtrees. In a language like Java, we would think of the tree as being changed (mutated). In OCaml, we instead think of returning a new tree that includes the new value.

I had defined a variable with values to use for testing:

        let test = [12; 38; 97; 5]

I asked people if they wanted to see these values inserted from left-to-right or right-to-left. Someone said to insert left-to-right.

        # let t1 = Empty;;
        val t1 : int_tree = Empty
        # let t2 = insert(12, t1);;
        val t2 : int_tree = Node (12, Empty, Empty)
        # let t3 = insert(38, t2);;
        val t3 : int_tree = Node (12, Empty, Node (38, Empty, Empty))
        # let t4 = insert(97, t3);;
        val t4 : int_tree =
          Node (12, Empty, Node (38, Empty, Node (97, Empty, Empty)))
        # let t5 = insert(5, t4);;
        val t5 : int_tree =
          Node (12, Node (5, Empty, Empty),
           Node (38, Empty, Node (97, Empty, Empty)))

The tree being constructed looks like this:

              +----+
              | 12 |
              +----+
             /      \
        +---+       +----+
        | 5 |       | 38 |
        +---+       +----+
                          \
                          +----+
                          | 97 |
                          +----+

To insert a sequence of values, you can use list recursion calling the insert function repeatedly:

        let rec insert_all(lst) =
            match lst with
            | []    -> Empty
            | x::xs -> insert(x, insert_all(xs))

Then we wrote a function for finding the height of a tree. I mentioned that I'm using a slightly different definition for the height of a tree. In the usual definition, the empty tree has a height of -1. I prefer to define the height of the empty tree as 0, so this is returning a count of the number of levels in the tree:

        let rec height(t) =
            match t with
            | Empty                   -> 0
            | Node(root, left, right) -> max (height left) (height right) + 1

In writing this, we had to use parentheses slightly differently because the built-in max function is a curried function. Notice how we follow max by two parenthesized calls on height.

I pointed out that we are not using the value of "root" (the data stored at the root). This is a good place to use an anonymous variable, which you indicate with an underscore:

        let rec height(t) =
            match t with
            | Empty                -> 0
            | Node(_, left, right) -> max (height left) (height right) + 1

In the interpreter, I constructed a tree with ten thousand random values and asked for its height by saying:

        let t = insert_all(random_numbers(10000))
        height(t)

We found that the height was around 33 even though we haven't done anything special to balance the tree.

Then I spent some time discussing the code for insert_all.

        let rec insert_all(lst) =
            match lst with
            | []    -> Empty
            | x::xs -> insert(x, insert_all(xs))

I asked whether this will insert from left-to-right or right-to-left given a list like our testing list of [12; 38; 97; 5]. Someone said it will insert from right-to-left. In other words, it will compute:

        insert(12, insert(38, insert(97, insert(5 Empty))))

This is a good example of where we can use a folding operation. The reduce function we've looked at isn't powerful enough to capture this proces, but List.fold_right is able to handle this. I asked for its syntax in the interpreter:

    # List.fold_right;;
    - : ('a -> 'acc -> 'acc) -> 'a list -> 'acc -> 'acc = <fun>

The first argument is a function. We have a function called insert, but it has the wrong syntax because it is not curried. But we can use the function called curry that I have included in our utility file to convert it to curried form:

        # insert;;
        - : int * int_tree -> int_tree = <fun>
        # (curry insert);;
        - : int -> int_tree -> int_tree = <fun>

The fold_right function also takes a list and an initial value to use for the accumulator, so we can define a variation of insert_all by saying:

        let insert_all2(lst) = List.fold_right (curry insert) lst Empty

We saw that this function produced the same result as our original insert_all.

What if we wanted to fold from left-to-right? I asked the interpreter for the general form of List.fold_left:

        # List.fold_left;;
        - : ('a -> 'b -> 'a) -> 'a -> 'b list -> 'a = <fun>

There is a problem here because when we fold from the left, it expects to call a function that begins with our initial value. Using our test list of [12; 38; 97; 5], the first call it will make is insert(Empty, 12). But we wrote insert to take the parameters in the other order (value first, tree second). Can we fix it so that it takes the parameters in another order?

This issue can come up in other contexts. For example, suppose you want to write a function to divide an int value by 2, as in:

        let halve(n) = n / 2

What if we wanted to instead partially instantiate the division operator? The problem is we want to provide the value 2 and have the other parameter be unspecified, but they're in the wrong order. We can fix that by making a function that switches the order of the parameters for a curried function like this:

        let switch f a b = f b a

Given this function, we can define halve more simply:

        let halve = switch (/) 2

We made a call on map to verify that this is working:

        # map(halve, 1--10);;
        - : int list = [0; 1; 1; 2; 2; 3; 3; 4; 4; 5]

We can do something similar with the curried version of insert to switch the order of its parameters:

        let insert_all3 = List.fold_left (switch (curry insert)) Empty

Notice that this is a partially instantiated function because List.fold_left ends with the list to process. This version produced the tree we had seen before that is obtained by processing the values from left to right:

        # insert_all3(test);;
        - : int_tree =
        Node (12, Node (5, Empty, Empty), Node (38, Empty, Node (97, Empty, Empty)))

Stuart Reges

Last modified: Fri Apr 12 15:11:53 PDT 2024