Tree and Graph Traversals Reading
Table of contents
Graphs are a class of abstract data types that model the relationships between data. While all of the data structures we’ve learned take advantage of the relationships between data (search trees, heaps, k-d trees) and the properties of data (hashing) to optimize search, graphs explicitly model these relationships and properties. In a graph, the relationships between data are often as important as the data themselves.
Data relationships
One of the implications of modeling the relationships between data is that graph algorithms are less interested in optimizing runtime of operations. In data structures, these relationships are typically hidden from the client, allowing data structure implementers to use a variety of algorithm ideas to improve runtime. Search trees, for example, use the underlying total order (sorted order) of the dataset to improve runtime. In contrast, the relationships between data are precisely what’s important to the graph client. Instead of querying the data to check if a key is stored in the set, the graph client is more interested in answering questions about the dataset.
Formally, a graph G = (V, E) consists of a set of vertices V and a set of edges E. Graphs can model real-world artifacts like maps by representing intersections or places as vertices and the roads connecting places as edges, where a graph problem might be to return the shortest path between two places. They can also model abstract concepts like social networks by representing people as vertices and edges to friends or followers, where a graph problem might be to identify cliques or clusters of close friends. This framing motivates our two lines of inquiry about graphs.
- Formulating complicated problems in terms of a graph. The choice of what to represent as vertices and edges often makes the difference between problems that can be solved efficiently and problems that can’t be solved at all.
- Applying familiar graph algorithms to solve a graph problem. Inventing new graph algorithms is difficult so it’s often helpful to reformulate the problem and then solve it using an existing algorithm.
A common graph problem is graph traversal (also known as graph search), where we visit (process) each vertex in a graph.
Special case: trees
In graph theory, trees are a special case of graphs with one important constraint: there is exactly one path between any two vertices. As a consequence, a tree (graph) G satisfies two properties.
- Connected
- Every vertex can reach every other vertex.
- Acyclic
- No cycles; no sequence of unique edges starting at a vertex and returning to the same vertex.
- Note
- In a data structure view of trees, edges are implicit: a strategy for improving runtime. In viewing trees as graphs, edges are now defined by the graph client.
There are multiple ways to traverse a tree.
Level-order traversal
Visit every vertex on a level (left-to-right) before moving to the next level.
Depth-first traversal
Recursively visit each child. There are three vertex orderings depending on when we decide to visit a vertex in relation to its children. Example code is given for a binary search tree, but these traversals can also apply to trees with more than 2 children.
- Preorder
- Visit the current vertex and then recursively visit each child. (Red, F-B-A-D-C-E-G-I-H.)
preorder(BSTNode x) { if (x == null) return; print(x.key) preorder(x.left) preorder(x.right) }
- Inorder
- Visit left children, the current vertex, and then right children. Only applies to search trees. (Yellow, A-B-C-D-E-F-G-H-I.)
inorder(BSTNode x) { if (x == null) return; inorder(x.left) print(x.key) inorder(x.right) }
- Postorder
- Recursively visit each child and then visit the current vertex. (Green, A-C-E-D-B-H-I-G-F.)
postorder(BSTNode x) { if (x == null) return; postorder(x.left) postorder(x.right) print(x.key) }
Simple graphs
A graph consists of a set of vertices and a set of zero or more edges. Instead of parent and child, each vertex is connected to its adjacent vertices (neighbors) with an edge.
In this class and in many real-world contexts, we take “graph” to mean “simple graph”. Simple graphs contain no self-loops or parallel edges: there can be at most one edge between any pair of vertices.
- Goal
- Systematically traverse a graph, visiting each vertex.
Suppose we apply the recursive, depth-first tree traversal algorithm on a graph. What's problematic?
While trees have exactly one path between any two vertices, graphs can have multiple paths between vertices. The traversal can get stuck in an infinite loop (infinite recursion) if the graph contains cycles.
Depth-first search
Depth-first search (DFS) is a recursive graph traversal algorithm maintaining two data structures: a set of marked
(visited) vertices and an edgeTo
map from vertices to the previous vertex on its path.
dfs(v)
- Mark
v
. - For each unmarked neighbor
w
, setedgeTo[w] = v
and calldfs(w)
.
- Mark
Consider running DFS on the graph below starting from the vertex labeled 0.
We can recover the s–t paths from 0 to every other reachable vertex in the graph by using the edgeTo
map to retrace our steps. For example, the DFS path between 0–7 is can be found by performing the following edgeTo
look-ups.
edgeTo[7] == 6
, so 6-7.edgeTo[6] == 5
, so 5-6-7.edgeTo[5] == 2
, so 2-5-6-7.edgeTo[2] == 1
, so 1-2-5-6-7.edgeTo[1] == 0
, so 0-1-2-5-6-7.
Breadth-first search
Breadth-first search (BFS) corresponds to level-order traversal in trees. Like DFS, breadth-first search also maintains a set of marked
vertices and an edgeTo
map. Unlike DFS, breadth-first search is an iterative algorithm that works by gradually expanding a circular frontier centered on the start vertex. A fringe
data structure stores unvisited nodes on the frontier while a distTo
map stores the distance of each vertex from the start vertex.
Breadth-first search begins by initializing the fringe
as a queue containing only the start vertex s
and marking s
as visited. Then, while the fringe
is not empty:
- Remove the next frontier vertex
v
from thefringe
. - For each unmarked neighbor
w
:- Mark
w
. - Set
edgeTo[w] = v
- Set
distTo[w] = distTo[v] + 1
. - Add
w
to thefringe
.
- Mark
If we have just visited a vertex k distance away from the start, give the distance for all of the items in the fringe in terms of k.
The distance to all of the items in the fringe must be k or k + 1.
While depth-first search recovers the s–t paths from a given start vertex, breadth-first search recovers the s–t shortest paths from a given start vertex.