Link

Tree and Graph Traversals Reading

Table of contents

  1. Data relationships
  2. Special case: trees
    1. Level-order traversal
    2. Depth-first traversal
  3. Simple graphs
    1. Depth-first search
    2. Breadth-first search

Graphs are a class of abstract data types that model the relationships between data. While all of the data structures we’ve learned take advantage of the relationships between data (search trees, heaps, k-d trees) and the properties of data (hashing) to optimize search, graphs explicitly model these relationships and properties. In a graph, the relationships between data are often as important as the data themselves.

Data relationships

One of the implications of modeling the relationships between data is that graph algorithms are less interested in optimizing runtime of operations. In data structures, these relationships are typically hidden from the client, allowing data structure implementers to use a variety of algorithm ideas to improve runtime. Search trees, for example, use the underlying total order (sorted order) of the dataset to improve runtime. In contrast, the relationships between data are precisely what’s important to the graph client. Instead of querying the data to check if a key is stored in the set, the graph client is more interested in answering questions about the dataset.

Formally, a graph G = (V, E) consists of a set of vertices V and a set of edges E. Graphs can model real-world artifacts like maps by representing intersections or places as vertices and the roads connecting places as edges, where a graph problem might be to return the shortest path between two places. They can also model abstract concepts like social networks by representing people as vertices and edges to friends or followers, where a graph problem might be to identify cliques or clusters of close friends. This framing motivates our two lines of inquiry about graphs.

  1. Formulating complicated problems in terms of a graph. The choice of what to represent as vertices and edges often makes the difference between problems that can be solved efficiently and problems that can’t be solved at all.
  2. Applying familiar graph algorithms to solve a graph problem. Inventing new graph algorithms is difficult so it’s often helpful to reformulate the problem and then solve it using an existing algorithm.

A common graph problem is graph traversal (also known as graph search), where we visit (process) each vertex in a graph.

Special case: trees

In graph theory, trees are a special case of graphs with one important constraint: there is exactly one path between any two vertices. As a consequence, a tree (graph) G satisfies two properties.

Connected
Every vertex can reach every other vertex.
Acyclic
No cycles; no sequence of unique edges starting at a vertex and returning to the same vertex.

Alternative Tree Definition

Note
In a data structure view of trees, edges are implicit: a strategy for improving runtime. In viewing trees as graphs, edges are now defined by the graph client.

There are multiple ways to traverse a tree.

Level-order traversal

Visit every vertex on a level (left-to-right) before moving to the next level.

Depth-first traversal

Recursively visit each child. There are three vertex orderings depending on when we decide to visit a vertex in relation to its children. Example code is given for a binary search tree, but these traversals can also apply to trees with more than 2 children.

Binary search tree traversals

Preorder
Visit the current vertex and then recursively visit each child. (Red, F-B-A-D-C-E-G-I-H.)
preorder(BSTNode x) {
    if (x == null) return;
    print(x.key)
    preorder(x.left)
    preorder(x.right)
}
Inorder
Visit left children, the current vertex, and then right children. Only applies to search trees. (Yellow, A-B-C-D-E-F-G-H-I.)
inorder(BSTNode x) {
    if (x == null) return;
    inorder(x.left)
    print(x.key)
    inorder(x.right)
}
Postorder
Recursively visit each child and then visit the current vertex. (Green, A-C-E-D-B-H-I-G-F.)
postorder(BSTNode x) {
    if (x == null) return;
    postorder(x.left)
    postorder(x.right)
    print(x.key)
}

Simple graphs

A graph consists of a set of vertices and a set of zero or more edges. Instead of parent and child, each vertex is connected to its adjacent vertices (neighbors) with an edge.

In this class and in many real-world contexts, we take “graph” to mean “simple graph”. Simple graphs contain no self-loops or parallel edges: there can be at most one edge between any pair of vertices.

Simple graph definition

Goal
Systematically traverse a graph, visiting each vertex.
Suppose we apply the recursive, depth-first tree traversal algorithm on a graph. What's problematic?

While trees have exactly one path between any two vertices, graphs can have multiple paths between vertices. The traversal can get stuck in an infinite loop (infinite recursion) if the graph contains cycles.

Depth-first search (DFS) is a recursive graph traversal algorithm maintaining two data structures: a set of marked (visited) vertices and an edgeTo map from vertices to the previous vertex on its path.

dfs(v)
  1. Mark v.
  2. For each unmarked neighbor w, set edgeTo[w] = v and call dfs(w).

Consider running DFS on the graph below starting from the vertex labeled 0.

We can recover the s–t paths from 0 to every other reachable vertex in the graph by using the edgeTo map to retrace our steps. For example, the DFS path between 0–7 is can be found by performing the following edgeTo look-ups.

  1. edgeTo[7] == 6, so 6-7.
  2. edgeTo[6] == 5, so 5-6-7.
  3. edgeTo[5] == 2, so 2-5-6-7.
  4. edgeTo[2] == 1, so 1-2-5-6-7.
  5. edgeTo[1] == 0, so 0-1-2-5-6-7.

Breadth-first search (BFS) corresponds to level-order traversal in trees. Like DFS, breadth-first search also maintains a set of marked vertices and an edgeTo map. Unlike DFS, breadth-first search is an iterative algorithm that works by gradually expanding a circular frontier centered on the start vertex. A fringe data structure stores unvisited nodes on the frontier while a distTo map stores the distance of each vertex from the start vertex.

Breadth-first search begins by initializing the fringe as a queue containing only the start vertex s and marking s as visited. Then, while the fringe is not empty:

  1. Remove the next frontier vertex v from the fringe.
  2. For each unmarked neighbor w:
    1. Mark w.
    2. Set edgeTo[w] = v
    3. Set distTo[w] = distTo[v] + 1.
    4. Add w to the fringe.
If we have just visited a vertex k distance away from the start, give the distance for all of the items in the fringe in terms of k.

The distance to all of the items in the fringe must be k or k + 1.

While depth-first search recovers the s–t paths from a given start vertex, breadth-first search recovers the s–t shortest paths from a given start vertex.