CSE413 Notes for Monday, 4/21/25

This is a good example of where we can use a folding operation. The reduce function we've looked at isn't powerful enough to capture this proces, but List.fold_right is able to handle this. I asked for its syntax in the interpreter:

    # List.fold_right;;
    - : ('a -> 'acc -> 'acc) -> 'a list -> 'acc -> 'acc = <fun>

The first argument is a function. We have a function called insert, but it has the wrong syntax because it is not curried. But we can use the function called curry that I have included in our utility file to convert it to curried form:

        # insert;;
        - : int * int_tree -> int_tree = <fun>
        # (curry insert);;
        - : int -> int_tree -> int_tree = <fun>

The fold_right function also takes a list and an initial value to use for the accumulator, so we can define a variation of insert_all by saying:

        let insert_all2(lst) = List.fold_right (curry insert) lst Empty

We saw that this function produced the same result as our original insert_all.

What if we wanted to fold from left-to-right? I asked the interpreter for the general form of List.fold_left:

        # List.fold_left;;
        - : ('a -> 'b -> 'a) -> 'a -> 'b list -> 'a = <fun>

There is a problem here because when we fold from the left, it expects to call a function that begins with our initial value. Using our test list of [12; 38; 97; 5], the first call it will make is insert(Empty, 12). But we wrote insert to take the parameters in the other order (value first, tree second). Can we fix it so that it takes the parameters in another order?

This issue can come up in other contexts. For example, suppose you want to write a function to divide an int value by 2, as in:

        let halve(n) = n / 2

What if we wanted to instead partially instantiate the division operator? The problem is we want to provide the value 2 and have the other parameter be unspecified, but they're in the wrong order. We can fix that by making a function that switches the order of the parameters for a curried function like this:

        let switch f a b = f b a

Given this function, we can define halve more simply:

        let halve = switch (/) 2

We made a call on map to verify that this is working:

        # map(halve, 1--10);;
        - : int list = [0; 1; 1; 2; 2; 3; 3; 4; 4; 5]

We can do something similar with the curried version of insert to switch the order of its parameters:

        let insert_all3 = List.fold_left (switch (curry insert)) Empty

Notice that this is a partially instantiated function because List.fold_left ends with the list to process. This version produced the tree we had seen before that is obtained by processing the values from left to right:

        # insert_all3(test);;
        - : int_tree =
        Node (12, Node (5, Empty, Empty), Node (38, Empty, Node (97, Empty, Empty)))

Then I asked for the big-O complexity of our insert_all function. It simply calls the insert function for each of the n values in a list. So how much work does insert perform? We are still limiting ourselves to the immutable constructs in OCaml. That means that our binary trees are constructed in a very different way than what we saw in the 123 and 143 intro classes. There we would descend the tree until we encountered a null reference and we would replace that with a reference to a node. That relies on the ability to mutate a node to replace an existing reference with a new reference.

Consider the code we have for insert:

        let rec insert(value, tree) =
            match tree with
            | Empty                   -> Node(value, Empty, Empty)
            | Node(root, left, right) ->
                if (value <= root) then Node(root, insert(value, left), right)
                else Node(root, left, insert(value, right))

This code also involves descending the tree, but at each level we end up creating a new node that has either a different left subtree or a different right subtree. OCaml will share the part that isn't changed, but it has to make a new node to keep track of what has changed. So a single insertion is going to be O(log n) if the tree is balanced and it will involve creating log n new nodes for each insertion. Our trees aren't guaranteed to be balanced, but we've seen that with randomized data it tends to be close. So this insert takes the same amount of time as a Java insert, but it creates more nodes because it can't mutate an existing node. Overall we'd say that inserting a list of n values will be O(n log n) when we have data that leads to a tree that is close to being balanced.

Then I asked people how we'd write a method called contains that takes a value n and a tree and that returns true if the value is in the tree and false otherwise. Someone quickly pointed out the base case for the empty tree that it doesn't contain anything:

        let rec contains(tree, n) =
            match tree with
            | Empty -> false
            ...

Remember that you typically want a different case for each of your different type constructors. The case above handles the empty tree, so we'll also need a case for a nonempty tree:

        let rec contains(tree, n) =
            match tree with
            | Empty                   -> false
            | Node(root, left, right) -> 
                ...

Someone said that if the root data is equal to n, then the answer is true. If not, someone said we could see if it is in either subtree:

        let rec contains(tree, n) =
            match tree with
            | Empty                   -> false
            | Node(root, left, right) -> 
                root = n || contains(left, n) || contains(right, n)

This code works, but it's not very efficient. It would potentially search the entire tree. Remember that we are working with a binary search tree. So the better thing to do is to check either the left subtree or the right subtree, but not both:

        let rec contains2(tree, n) =
            match tree with
            | Empty                   -> false
            | Node(root, left, right) ->
                if root = n then true
                else if n < root then contains2(left, n)
                else contains2(right, n)

This version turns out to be quite efficient, which I demonstrated in the OCaml interpreter by typing:

        let t = insert_all(random_numbers(1000000))
        filter((fun x -> contains2(t, x)), 1--100000)

The request for 100 thousand calls on contains2 executed almost without pause.

As another example, I asked people how we could convert the binary search tree into a sorted list of ints. This took people a few minutes to figure out, but eventually we came to the realization that an inorder traversal of the tree will produce the values in sorted order, so we simply need to collapse it using an inorder traversal.

The easy case is to collapse an empty tree, which gives you an empty list:

        let rec collapse(tree) =
            match tree with
            | Empty -> []
            ...

As usual, we'll need a case for the nonempty tree:

        let rec collapse(tree) =
            match tree with
            | Empty                   -> []
            | Node(root, left, right) -> ...

In this case we want to recursively collapse the left and right subtrees and glue the two pieces together with the root data in the middle. Someone suggested doing it this way:

        let rec collapse(tree) =
            match tree with
            | Empty                   -> []
            | Node(root, left, right) ->
                collapse(left) @ root::collapse(right)

This code works well, but I have a slight preference in this case for expressing it as the appending of three different lists:

        let rec collapse(tree) =
            match tree with
            | Empty                   -> []
            | Node(root, left, right) ->
                collapse(left) @ [root] @ collapse(right)

The first version is better in the sense that it demonstrates an understanding of the difference between the cons operator(::) and the append operator(@), but I prefer the second because I conceive of the problem as putting together three different things. Both are perfectly fine ways to write the code.

We previously wrote a function to determine the height of a tree. A related notion is the depth of a given value stored in the tree. Consider, for example, this tree:

What is the depth of the node with 9 in it? We compute it by finding the length of the path from the root to the node, but there are two things we could count: nodes or edges. For computing height, I'm a node counter and would say that this tree has a height of 3. That's how we wrote the function previously. At least I have Don Knuth on my side for that. I'm a node counter for height because I want to have a good answer to the question, "What is the height of the empty tree?" For me, the answer is 0. For edge counters, it's either -1 or undefined or "I'm not sure."

But for the depth of a specific value, I prefer the standard definition of counting edges because we don't have any odd situations for the empty tree (it has no nodes, so none of them have a depth). So I'd say that in this tree 18 has a depth of 0, 27 has a depth of 1, and 9 has a depth of 2.

Then I asked how we'd write a function to find the depth of a given value in a search tree. We usually start with an empty tree as the base case, but it's not clear what to return. What does it mean if you have gotten to an empty tree? It means the value wasn't found in the tree. Someone suggested we could return -1 in that case, the way we do for calls on indexOf in a language like Java when it doesn't find a value in a list:

        let rec depth_of(n, tree) =
            match tree with
            | Empty -> -1
            ...

If the value stored at the root is n, then this value has a depth of 0 (it appears in level 1 of the tree):

        let rec depth_of(n, tree) =
            match tree with
            | Empty                   -> -1
            | Node(root, left, right) ->
                if root = n then 0
            ...

What if it's not at the root? It's a binary search tree, so to be efficient, we should either search the left subtree or the right subtree, but not both:

        let rec depth_of(n, tree) =
            match tree with
            | Empty                   -> -1
            | Node(root, left, right) ->
                if root = n then 0
                else if n < root then 1 + depth_of(n, left)
                else 1 + depth_of(n, right)

I had a list of ints that I used to define a variable "test" that we could use for testing:

        let test = [40; 72; 15; 0; -8; 95; 103; 72; 272; 143; 413; 341]

Then I defined a variable "t" by calling the insert_all function we wrote in the previous lecture to produce a binary search tree.

        # let t = insert_all(test);;
                val t : int_tree =
                  Node (341,
                   Node (143,
                    Node (72,
                     Node (-8, Empty,
                      Node (0, Empty,
                       Node (15, Empty, Node (72, Node (40, Empty, Empty), Empty)))),
                     Node (103, Node (95, Empty, Empty), Empty)),
                    Node (272, Empty, Empty)),
                   Node (413, Empty, Empty))

I tried loading our definition for depth_of into the interpreter and using this variable t we could see that it worked fairly well:

        # map((fun x -> depth_of(x, t)), test);;
        - : int list =
        [7; 2; 5; 4; 3; 4; 3; 2; 2; 1; 1; 0]

But we ran into problems when we asked about a value not in the tree:

        # depth_of(42, t);;
        - : int = 7

We were expecting a value of -1. How did we end up with 7? The answer is that our solution to depth_of descends the tree looking for the given value and then adds 1 to the result as it comes back out. This value of 42 would become the right child of the leaf node that has 40 in it. Remember that 40 had a depth of 8. So when we find that its right child is empty, we return a -1 and then add one to the result 8 different times to get an overall result of 7.

I asked how to fix it and someone suggested turning it into a tail recursive function with an accumulator that keeps track of the depth. That way when we return -1, there won't be any other computations performed after that. We introduced a helper function to accomplish this:

        let depth_of(n, tree) =
            let rec helper(t, depth) =
                match t with
                | Empty                   -> -1
                | Node(root, left, right) ->
                    if root = n then depth
                    else if n < root then helper(left, depth + 1)
                    else helper(right, depth + 1)
            in helper(tree, 0)

This version worked properly.

Stuart Reges

Last modified: Mon Apr 21 15:00:31 PDT 2025