Consider the code we have for insert:
let rec insert(value, tree) =
match tree with
| Empty -> Node(value, Empty, Empty)
| Node(root, left, right) ->
if (value <= root) then Node(root, insert(value, left), right)
else Node(root, left, insert(value, right))
This code also involves descending the tree, but at each level we end up
creating a new node that has either a different left subtree or a different
right subtree. OCaml will share the part that isn't changed, but it has to
make a new node to keep track of what has changed. So a single insertion is
going to be O(log n) if the tree is balanced and it will involve creating log n
new nodes for each insertion. Our trees aren't guaranteed to be balanced, but
we've seen that with randomized data it tends to be close. So this insert
takes the same amount of time as a Java insert, but it creates more nodes
because it can't mutate an existing node. Overall we'd say that inserting a
list of n values will be O(n log n) when we have data that leads to a tree that
is close to being balanced.Then I asked people how we'd write a method called contains that takes a value n and a tree and that returns true if the value is in the tree and false otherwise. Someone quickly pointed out the base case for the empty tree that it doesn't contain anything:
let rec contains(tree, n) =
match tree with
| Empty -> false
...
Remember that you typically want a different case for each of your different
type constructors. The case above handles the empty tree, so we'll also need a
case for a nonempty tree:
let rec contains(tree, n) =
match tree with
| Empty -> false
| Node(root, left, right) ->
...
Someone said that if the root data is equal to n, then the answer is true. If
not, someone said we could see if it is in either subtree:
let rec contains(tree, n) =
match tree with
| Empty -> false
| Node(root, left, right) ->
root = n || contains(left, n) || contains(right, n)
This code works, but it's not very efficient. It would potentially search the
entire tree. Remember that we are working with a binary search tree. So the
better thing to do is to check either the left subtree or the right subtree,
but not both:
let rec contains2(tree, n) =
match tree with
| Empty -> false
| Node(root, left, right) ->
if root = n then true
else if n < root then contains2(left, n)
else contains2(right, n)
This version turns out to be quite efficient, which I demonstrated in the OCaml
interpreter by typing:
let t = insert_all(random_numbers(1000000))
filter((fun x -> contains2(t, x)), 1--100000)
The request for 100 thousand calls on contains2 executed almost without
pause.As another example, I asked people how we could convert the binary search tree into a sorted list of ints. This took people a few minutes to figure out, but eventually we came to the realization that an inorder traversal of the tree will produce the values in sorted order, so we simply need to collapse it using an inorder traversal.
The easy case is to collapse an empty tree, which gives you an empty list:
let rec collapse(tree) =
match tree with
| Empty -> []
...
As usual, we'll need a case for the nonempty tree:
let rec collapse(tree) =
match tree with
| Empty -> []
| Node(root, left, right) -> ...
In this case we want to recursively collapse the left and right subtrees and
glue the two pieces together with the root data in the middle. Someone
suggested doing it this way:
let rec collapse(tree) =
match tree with
| Empty -> []
| Node(root, left, right) ->
collapse(left) @ root::collapse(right)
This code works well, but I have a slight preference in this case for
expressing it as the appending of three different lists:
let rec collapse(tree) =
match tree with
| Empty -> []
| Node(root, left, right) ->
collapse(left) @ [root] @ collapse(right)
The first version is better in the sense that it demonstrates an understanding
of the difference between the cons operator(::) and the append operator(@),
but I prefer the second because I conceive of the problem as putting together
three different things. Both are perfectly fine ways to write the code.We previously wrote a function to determine the height of a tree. A related notion is the depth of a given value stored in the tree. Consider, for example, this tree:
18
\
27
/
9
What is the depth of the node with 9 in it? We compute it by finding the
length of the path from the root to the node, but there are two things we could
count: nodes or edges. For computing height, I'm a node counter and would say
that this tree has a height of 3. That's how we wrote the function previously.
At least I have Don Knuth on my side for that. I'm a node counter for height
because I want to have a good answer to the question, "What is the height of
the empty tree?" For me, the answer is 0. For edge counters, it's either -1
or undefined or "I'm not sure."But for the depth of a specific value, I prefer the standard definition of counting edges because we don't have any odd situations for the empty tree (it has no nodes, so none of them have a depth). So I'd say that in this tree 18 has a depth of 0, 27 has a depth of 1, and 9 has a depth of 2.
Then I asked how we'd write a function to find the depth of a given value in a search tree. We usually start with an empty tree as the base case, but it's not clear what to return. What does it mean if you have gotten to an empty tree? It means the value wasn't found in the tree. Someone suggested we could return -1 in that case, the way we do for calls on indexOf in a language like Java when it doesn't find a value in a list:
let rec depth_of(n, tree) =
match tree with
| Empty -> -1
...
If the value stored at the root is n, then this value has a depth of 0 (it
appears in level 1 of the tree):
let rec depth_of(n, tree) =
match tree with
| Empty -> -1
| Node(root, left, right) ->
if root = n then 0
...
What if it's not at the root? It's a binary search tree, so to be efficient,
we should either search the left subtree or the right subtree, but not both:
let rec depth_of(n, tree) =
match tree with
| Empty -> -1
| Node(root, left, right) ->
if root = n then 0
else if n < root then 1 + depth_of(n, left)
else 1 + depth_of(n, right)
I had a list of ints that I used to define a variable "test" that we
could use for testing:
let test = [40; 72; 15; 0; -8; 95; 103; 72; 272; 143; 413; 341]
Then I defined a variable "t" by calling the insert_all function we
wrote in the previous lecture to produce a binary search tree.
# let t = insert_all(test);;
val t : int_tree =
Node (341,
Node (143,
Node (72,
Node (-8, Empty,
Node (0, Empty,
Node (15, Empty, Node (72, Node (40, Empty, Empty), Empty)))),
Node (103, Node (95, Empty, Empty), Empty)),
Node (272, Empty, Empty)),
Node (413, Empty, Empty))
I tried loading our definition for depth_of into the interpreter and
using this variable t we could see that it worked fairly well:
# map((fun x -> depth_of(x, t)), test);;
- : int list =
[7; 2; 5; 4; 3; 4; 3; 2; 2; 1; 1; 0]
But we ran into problems when we asked about a value not in the tree:
# depth_of(42, t);;
- : int = 7
We were expecting a value of -1. How did we end up with 7? The
answer is that our solution to depth_of descends the tree looking for
the given value and then adds 1 to the result as it comes back out.
This value of 42 would become the right child of the leaf node that
has 40 in it. Remember that 40 had a depth of 8. So when we find
that its right child is empty, we return a -1 and then add one to the
result 8 different times to get an overall result of 7.We could fix this by converting this into a tail recursive function with an accumulator so that when we reach an empty tree, there is nothing left to do but to return that special value -1. Another option would be to raise an exception. So we could imagine that we have a precondition on the function that they shouldn't call it if the value isn't in the tree. That doesn't seem like a very friendly thing to do, though. Someone said they could call contains before calling depth_of, but that means they have to search the tree twice: once to see if it's there, and a second time to find its depth.
OCaml offers a good alternative. This is a good place to use what is known as an option type. An option is appropriate when the answer is "0 or 1 of" something. In other words, sometimes there is an answer and sometimes not. The technical term for this is a "one of" type (versus a tuple where you have to include all values, which is called an "each of" type).
This situation comes up often in programming and there are many different approaches for addressing it. For the Scanner class in Java if you try to read a value when you have reached the end of a file, it throws an exception. There are other reading operations in the Java class libraries that return -1 when there is no data to read. When you call get on a key that isn't in a map, it returns the value null.
What is the maximum value in an empty list? There isn't one. This is a similar case. What is the depth of a value n in the tree? Sometimes there is an answer and sometimes the answer is, "There isn't one."
The option type is defined as follows:
type 'a option = None | Some of 'a
Notice the use of 'a, which means that it is polymorphic (you can have
an int option, string option, float option, etc). Using these two
constructors, I tried to rewrite our definition:
let rec depth_of(n, tree) =
match tree with
| Empty -> None
| Node(root, left, right) ->
if root = n then Some 1
else if n < root then 1 + depth_of(n, left)
else 1 + depth_of(n, right)
This produced an error. It's the same problem we had with the attempt
to return -1. We have to be consistent. The recursive calls on left
and right are adding 1 to the result being returned. You can't add
one to an option and we can't return an option in one case and not the
other. And we can't extract the value from the option because we
don't know that the recursive call will succeed. So we have to
convert this into a tail recursive version by introducing a helper
function with an accumulator that keeps track of the current depth:
let depth_of(n, tree) =
let rec helper(t, depth) =
match t with
| Empty -> None
| Node(root, left, right) ->
if root = n then Some depth
else if n < root then helper(left, depth + 1)
else helper(right, depth+ 1)
in helper(tree, 0)
Obviously we will sometimes want to turn an option result into an
actual value. You can do so using the function Option.get. For
example:
# let result = depth_of(t, 40);;
val result : int option = Some 8
# let d = Option.get(result);;
val d : int = 8
There are also functions Option.is_some and Option.is_none that can be used to
test which kind of option you have. More often we find ourselves using pattern
matching with the None and Some constructors.As one final example I said that I wanted to write a function list_max that would return the maximum value in a list. Using the option constructors, we can define base cases for an empty list and a one-element list:
let rec list_max(lst) =
match lst with
| [] -> None
| [x] -> Some(x)
...
It's tempting to return x in the one-element list case, but we got an error
message when we did so because then OCaml would assume that x is an int option
instead of an int. For the final case, we want to compute the max of the rest
of the list only once, so we introduced a let expression:
let rec list_max(lst) =
match lst with
| [] -> None
| [x] -> Some(x)
| x::xs ->
let max = list_max(xs)
in if x > Option.get(max) then Some(x)
else max
It might seem odd that we're calling Option.get in the final case without
testing whether there is a value to get, but keep in mind that we know that xs
has at least one value in that branch, so we know that we will get a value
returned.