handout #22

CSE311—Foundations of Computing I
Notes on Trees (Rosen 10.1)

The basic definition of a tree is as follows:

A tree is a connected, acyclic graph.
There are two unstated assumptions included with this: that the graph is undirected and that it is nonempty. Obviously you can explore variations like what is called a DAG (Directed Acyclic Graph), but for our purposes, we are using this fairly simple definition.

Trees are fragile in the sense that they occupy a Goldilocks point on the spectrum of possible edge connections that might exist in a graph. Remove an edge from a tree and it becomes disconnected. Add an edge to a tree and it introduces a cycle. In fact, an equivalent definition of a tree is that between every pair of distinct vertices there is exactly one path. So if you remove an edge, that breaks one of these paths. If you add an edge, that edge combined with the existing path forms a cycle.

I mentioned that in computer science we usually deal with what are called rooted trees. A rooted tree is one in which a vertex has been designated as the root. This introduces a corresponding concept of children that should be familiar from CSE143. The neighbors of the root are its children. The neighbors of those vertices (not including their parent) are considered their children, and so on.

An m-ary tree is a rooted tree in which each vertex has at most m children. In CSE143 we saw 2-ary (or binary) trees. We also tend in computer science to deal with what are called ordered trees, where the order of the children matters. For example, in a standard binary tree, if a vertex has exactly one child, it matters whether it's a left child or a right child. So really the trees we were dealing with in CSE143 are ordered, 2-ary trees.

We are familiar with the notion of a leaf. A leaf is a vertex of a rooted tree that has no children. We proved that every tree has a leaf. I said that this is easy to prove by contradiction. Suppose you have a rooted tree of n vertices that has no leaf. Start at the root. It's not a leaf, so you can go to one of its children. That also can't be a leaf, so you can go to one of its children. That also can't be a leaf, so you can go to one of its children. Repeat this process n times and you will have visited n + 1 vertices (the root plus the n children/grandchildren/etc you end up visiting). But there are only n vertices total, so visiting those n + 1 vertices guarantees that we visit at least one vertex twice (by the pigeonhole principle). Therefore the tree has a cycle, which is a contradiction. Therefore it must be true that every rooted tree has a leaf.

I then mentioned that for every rooted tree, the number of vertices is one more than the number of edges. I asked how we could prove this. Someone said, "by induction." That's true, but we have to be careful to say what kind of induction we're doing. We've seen induction on integers and we've seen induction on recursive definitions. Our definition of a tree is not recursive (it involves a set of properties instead), so we can't use the kind of structural definition we used before. Instead we have to pick some property of the tree that can be expressed as an integer and do induction on that. There are many such properties and it is common to pick different properties for different proofs. We can do induction on the number of vertices in the tree, the number of edges in the tree, the height of the tree, and so. We could even do induction on something like the number of leaves in the tree, although that doesn't tend to make for an easy proof.

To prove that the number of vertices in a rooted tree is one more than the number of edges, we can do induction on the number of vertices. Trees aren't empty, so the simplest tree to consider is a single vertex. It has no edges and 1 is 1 more than 0, so the property holds in the base case. Assume that the property holds of all trees of k vertices for some k in Z+. We will prove that the property holds of any tree of k + 1 vertices. Let T be a tree of k + 1 vertices. Our goal is to prove that it has k edges. It must have a leaf L. Let T' be the tree obtained by removing L and the edge that connects it to T. T' has all but one of the vertices of T, so it has k vertices. Applying the inductive hypothesis, it must have k - 1 edges. But T has just one more edge than T' (the edge that connects L to T), so it must have k edges. Thus, the property holds of T as well. This completes the proof that the number of vertices in a rooted tree is one more than the number of edges.

This approach of removing a leaf is very common for tree induction proofs, but it doesn't always work out. In a second induction example, I revisited the idea of a full binary tree. Recall that a full binary tree is one in which every vertex has 0 or 2 children (this was true of the Huffman tree and the 20 questions tree in CSE143). Removing a leaf from a full binary tree is not nearly as helpful because it leads to something that is not a full binary tree. You could try to figure out a way to remove two leaves, but that gets fairly tricky. So I mentioned another common technique in the second sample induction proof.

Prove that in a full binary tree, the number of leaves is always one more than the number of nonleaves. The nonleaves are the vertices that have children and are called internal vertices (what we call branch nodes in programming classes). We will prove this by strong induction on the height of the tree. We are assuming the standard definition of height where the tree of one vertex is considered to have height 0. There is only one tree of height 0 and it is composed of one leaf and no internal vertices, so it has one more leaf than nonleaves (1 = 0 + 1). Now assume that the property holds of all full binary trees of height 0 through k for some k in Z+. We show that it holds for all full binary trees of height k + 1. Let T be a full binary tree of height k + 1 with overall root R. T has more than one vertex, so R must have a child, and T is a full binary tree, so R must have two children. Let T1 and T2 be the children of R. Suppose that T1 has i1 internal vertices and T2 has i2 internal vertices. Both T1 and T2 have between height between 0 and k and are full binary trees, so by the inductive hypothesis, T1 has i1 + 1 leaves and T2 has i2 + 1 leaves. T has just one vertex that is not included in T1 and T2 (the root R) and it is an internal vertex. Therefore, the leaves of T are simply the union of the leaves of T1 and T2, which means it has (e1 + 1) + (e2 + 1) = e1 + e2 + 2 leaves. The internal vertices of T include the internal vertices of T1 and T2 along with R. Thus, T has e1 + e2 + 1 internal vertices. Notice that e1 + e2 + 2 = (e1 + e2 + 1) + 1, which means the property holds of T. This completes the proof that the number of leaves in a full binary tree is one more than the number of internal vertices.


Stuart Reges
Last modified: Wed Dec 8 13:42:51 PST 2010