CSE143 Notes for Wednesday, 2/22/06

I started a new topic called binary trees. I mentioned that they turn out to be very helpful in solving many different problems. There is an elegance and power that you get with binary trees that makes them extremely useful.

I mentioned that when I first introduced linked lists I mentioned the idea that working with them is a little like working with Lego building blocks where each block looks like this:

        +------+------+
        | data | next |
        |  18  |  +---+--->
        +------+------+
Binary trees are similar, but the building block has two arrows instead of one:

        +-------------+
        |    data     |
        |     18      |
        +------+------+
        | left |right |
        |  /   |   \  |
        +-/----+----\-+
         /           \
        /             \
       V               V
I started with some basic terminology. I drew a crude picture of a tree and pointed out that we refer to the base of the tree as its root and we talk about branches of the tree with leaves on the branches. Computer scientists view the world upside down, so we draw a tree with the root a the top and the leaves at the bottom. For example, we might draw this tree:

              12
             /  \
            /    \
          18      7
          /\       \
         /  \       \
        9    4      13
Just as with linked lists, we refer to each different value as a node. At the top of this tree we have the root node storing the value 12. We would refer to the nodes with 12, 18 and 7 as branch nodes of the tree because they have values stored under them. The nodes with values 9, 4 and 13 are leaf nodes of the tree because each one has nothing under it.

Another set of terms we use is parent and child. The root node is the ultimate ancestor of every other node. It is the parent of the nodes 18 and 7. Similarly, we'd say that the parents of 18 and 7 are the root node 12. We also sometimes refer to 18 and 7 as siblings.

I then gave a recursive definition of a tree. I said that a tree is either:

The key phrase in this definition is "subtree". The tree is composed of smaller trees. So our recursive definition involves thinking of a tree as either being empty or being of this form:

               +-----------+
               | root node |
               +-----------+
                   /   \
                 /       \
               /           \
             /\             /\
            /  \           /  \
           /    \         /    \
          / left \       / right\
         / subtree\     / subtree\
        +----------+   +----------+
That will be a useful way to think about trees as we write binary tree code. Using our recursive definition, we discussed how you could form various kinds of trees. The simplest kind of tree is an empty tree, which can't really be drawn because it's empty.

Once you have an empty tree, you can use the second part of the definition to make a new kind of tree that is composed of a root node with left and right subtrees that are both empty. In other words, that would be a single leaf node, which we could represent with a dot:

        .
Now that we have this as an option, we can use our recursive rule to say that a tree could be a root node with a left tree that is empty and a right tree that is a leaf:

        .
         \
          .
Or we can have an empty right and a leaf to the left:

          .
         /
        .
Or we could have leaf on either side:

          .
         / \
        .   .
These now become possibilities to use for our recursive definition, allowing us to construct even more kinds of trees, as in:

             .                  .            .            .
            / \                / \          / \          / \
          .     .            .     .      .     .      .     .
         / \   / \          / \            \   /        \     \    
        .   . .   .        .   .            . .          .     . 
Then I showed people the node class we'll be using for a simple binary tree of ints:

        public class TreeNode {
            public int data;
            public TreeNode left;
            public TreeNode right;
                
            public TreeNode(int data) {
                this(data, null, null);
            }
                        
            public TreeNode(int data, TreeNode left, TreeNode right) {
                this.data = data;
                this.left = left;
                this.right = right;
            }
        }
As with our linked list node, this node is very simple in that it has just some public data fields and a few constructors. The node has a data field of type int and two links of type TreeNode for the left and right subtrees. The first constructor constructs a leaf node (using null for left and right). The second constructor would be appropriate for a branch node where you want to specify the left and right subtrees.

This node class is "messy" in the sense that it is not well encapsulated. We did something similar with the linked list nodes and I made the analogy that they are like the cans of paint that a contractor might use in painting your house. We want a cleaner interface for dealing with a client, so I mentioned that we'll have a second object for storing a tree. Any external client will deal with the Tree object and won't ever see these tree node objects.

We need only one data field in the tree class: a reference to the root of the tree:

        public class Tree {
            private TreeNode overallRoot;

            ...
        }
I pointed out that I'm purposely using the name "overallRoot" to distinguish this root from all of the other roots. There is only one overall root. But each subtree is itself the root of a tree and in our recursive methods we'll often use the parameter name "root" as a way to indicate that it can be any of the roots.

I then spent time discussing the idea of tree traversals. The idea is to "traverse" the tree in such a way that you visit each node exactly once. There are many different ways to do this. We generally prefer recursive approaches, so we want to traverse the entire left subtree without dealing with anything from the right and in a separate operation, traverse the entire right subtree without dealing with anything from the left. That leads to the classic binary tree traversals. We have a Western bias that we traverse the left subtree before the right subtree. The question becomes, where do you deal with the root of the tree?

There are three possible answers you might give. You can process the root before you traverse either subtree, after you traverse both subtrees or in between traversing the two subtrees. These three approaches are known as preorder, inorder and postorder traversals.

                            +------------------+
  +-----------<-------------+ process the root +------------->-----------+
  |                         +--------+---------+                         |
  |                                  |                                   |
  V     +-----------------------+    V    +------------------------+     V
 pre    | traverse left subtree |    in   | traverse right subtree |   post
        +-----------------------+         +------------------------+
For example, given the following tree:

                                 +---+
                                 | 2 |
                                 +---+
                               /       \
                             /           \
                       +---+               +---+
                       | 0 |               | 3 |
                       +---+               +---+
                      /                   /     \
                     /                   /       \
                  +---+               +---+     +---+
                  | 7 |               | 1 |     | 9 |
                  +---+               +---+     +---+
                 /     \              /              \
                /       \            /                \
             +---+     +---+      +---+              +---+
             | 6 |     | 5 |      | 8 |              | 4 |
             +---+     +---+      +---+              +---+

The traversals would be as follows:

I pointed out that I will be asking a question on the final exam like this where you have to show the different traversals for a specific tree like this.

Then I turned back to our sample TreeNode and Tree classes and talked about how we could write some basic methods for the class. In particular, I wanted to write a public method for the Tree class that would print the tree values using a preorder traversal. So our public method will look like this:

        public void printPreorder() {
            ...
        }
I mentioned that it will almost always be the case when you go to write such a method that you actually have to write a pair of methods to solve the problem. The issue is that from the client's point of view, they want to print the entire tree. But to solve this problem recursively, we need a method that works on every subtree, not just the overall tree. We need a method that takes a TreeNode as a parameter, so we'll create a private method:

        private void printPreorder(TreeNode root) {
            ...
        }
We'll start the recursive process by passing it the overall root, which means that our public method requires just a single line of code that calls the private method passing it the overall root:

        public void printPreorder() {
            printPreorder(overallRoot);
        }
But how do we write the private method? I told people that it's good to go back to the basic definition of a binary tree. Remember that it is either an empty tree or it is a root node with left and right subtrees. If it's an empty tree, then there isn't anything to print. That means we could begin our private method this way:

        private void printPreorder(TreeNode root) {
            if (root == null)
               // do nothing
            ...
        }
But since we have nothing to do in this case, it's better to test the negation of this:

        private void printPreorder(TreeNode root) {
            if (root != null)
            ...
        }
So what do we do in the case where node is not null? That would mean we have a root node that has some data in it and we have left and right subtrees that need to be printed. The way a preorder traversal works is that we handle the root node first, which means we'd print out the data for the root.
        private void printPreorder(TreeNode root) {
            if (root != null) {
                System.out.println(node.root);
                ...
            }
        }
What do we do after print the data for this node? We want to print the left subtree in a preorder manner and then print the right subtree in a preorder manner. If you're thinking recursively, you'll think, "If only I had a method to print a subtree in a preorder manner...but I do have such a method...the one I'm writing." So this becomes:

        private void printPreorder(TreeNode root) {
            if (root != null) {
                System.out.println(root.data);
                printPreorder(root.left);
                printPreorder(root.right);
            }
        }
That completes the method. I then asked people how to modify the code to print the tree in an inorder manner and people said to put the println in between the two recursive calls. What about a postorder traversal? You put the println after the two recursive calls.

Someone pointed out that in my sample program, I use indentation to indicate the level of each node. How do we do that? Someone mentioned introducing a counter of some kind. That's the right idea, but that is more of an iterative way to think of it. For a loop, we might say:

        int count = 0;
        while (...) {
            count++;
            ...
        }
We want the recursive equivalent, which you can get by introducing an extra parameter. So the header for the private method becomes:

        private void printPreorder(TreeNode root, int level) {
We initialize this "counter" in the public method:
        public void printPreorder() {
            printPreorder(overallRoot, 0);
        }
And we have to modify each of the recursive calls to do the recursive equivalent of incrementing the variable:

        
        printPreorder(root.left, level + 1);
        printPreorder(root.right, level + 1);
In handout #24 you'll see that this value is then used with a for loop to print appropriate indentation for each node.

I had only a few minutes to describe the technique I used to construct the tree. I passed the constructor an integer indicating how many levels the three should have (a quantity known as the "height" of the tree). I then constructed a tree that has every possible node (something known as a "perfect" tree) using random values in the range of 0 to 99. I had time to sketch out the basic idea. I asked people what height would be easiest to work with. Someone said 0 and I said that's right. The 0-height tree is the empty tree, so it's represented by the value null (the same way we represented the empty list with linked lists). If it's not a 0-height tree, we can construct a root node with left and right subtrees of height one less than the requested height. The complete code appears in handout #24.


Stuart Reges
Last modified: Mon Feb 27 13:09:04 PST 2006