Minimum Spanning Trees Reading
Table of contents
Given a weighted, undirected graph G, a spanning tree T is a subgraph of G with the following properties.
- T is connected.
- T is acyclic.
- T includes all of the vertices in G.
Recall that the first two properties are part of the graph theory definition of a tree. The third property makes it a spanning tree.
If G is a connected weighted graph with V vertices and E edges, how many edges are in a spanning tree of G? Assume every vertex is reachable from every other vertex.
Exactly V - 1 edges since we need a tree connecting V vertices.
Can a graph have more than one spanning tree?
Yes, but only if there exists a cycle in the graph. Cycles provide more than one way of reaching a vertex.
A minimum spanning tree (MST) is a spanning tree of minimum total weight. MSTs have many applications, including communication networks (e.g. connecting to the internet) as well as handwriting recognition and medical imaging (before machine learning approaches).
Can a graph have more than one minimum spanning tree?
Yes, but this is only possible if there is (1) more than one spanning tree and (2) duplicate edge weights. For a spanning tree to be a minimum spanning tree, it must have the minimum total weight. If there are multiple spanning trees, there can be more than one MST if they share the same minimum total weight.
Cut property
MST algorithms rely on the cut property.
- Cut
- An assignment of a graph’s nodes to two non-empty sets.
- Crossing edge
- An edge that crosses a cut, connecting nodes from one set to the other.
- Cut property
- Given any cut, a minimum-weight crossing edge must be in the minimum spanning tree.
- Note
- A cut may have multiple edges in the MST.
By repeatedly applying the cut property, we can find a MST T
for a graph G
.
T = {} # set of edges
while len(T) != V - 1:
cut = nextCut(G)
edge = minWeightCrossingEdge(cut)
T.add(edge)
MST algorithms provide implementations for nextCut
and minWeightCrossingEdge
.
Kruskal’s algorithm
Kruskal’s algorithm implements the pseudocode by considering each edge in ascending (sorted) order by weight. An edge is added to the MST unless doing so introduces a cycle. The loop terminates when the MST contains V - 1 edges.
- Sort all edges by weight.
- While the number of edges in the MST < V - 1:
- Add the next lightest edge to the MST only if it doesn’t introduce a cycle.
Which algorithms can we use to detect a cycle?
Previously, we learned that graph traversal algorithms like BFS and DFS can detect a cycle if we find an edge to a marked vertex. However, this takes O(E + V) time and it needs to be run in a loop with V iterations, resulting in a quadratic time algorithm.
Dynamic connectivity
A faster way to solve this problem is with a dynamic connectivity algorithm. Given an int N
specifying the total number of sites (indexed 0 through N - 1), the dynamic connectivity problem specifies the following API describing the basic operations that we need.
connect(int p, int q)
- Add a connection between sites
p
andq
. isConnected(int p, int q)
- Returns true if and only if sites
p
andq
are in the same component.
It turns out that both connect
and isConnected
can be implemented in O(log N) runtime. Whenever an edge is added to the MST, connect
the two vertex sets. Check for a cycle by calling isConnected
on the incident vertices of a candidate edge.
What is the overall order of growth of Kruskal's algorithm if we use the dynamic connectivity algorithm for cycle-checking?
Sorting E edges is in O(E log E) runtime if we use merge sort. Recal E < V2 so O(E log E) = O(E log V) due to logarithm identities.
For dynamic connectivity, at most V - 1 calls will be made to connect
but isConnected
can be checked up to E times. The runtime for these operations is in O(V log V + E log V) = O(E log V) as well.
Thus, the overall runtime is in O(E log V).
Prim’s algorithm
Prim’s algorithm finds the minimum spanning tree by repeatedly applies the cut property with frontier exploration from breadth-first search. Beginning from an arbitrary start node, Prim’s algorithm iteratively adds the lightest edge on the frontier.
Prim’s can be implemented using a priority queue of vertices ordered on the minimum distance from the growing MST.
class PrimMST {
private Set<Long> marked = new HashSet<>();
private Map<Long, Long> edgeTo = new HashMap<>();
private Map<Long, Double> distTo = new HashMap<>();
// Assume StreetMapGraph is a weighted, undirected graph.
PrimMST(StreetMapGraph g, long s) {
ExtrinsicMinPQ<Long> fringe = new ArrayHeapMinPQ<Long>();
fringe.add(s, 0);
marked.add(s);
edgeTo.put(s, s);
distTo.put(s, 0);
for (long v : g.vertices()) {
if (v != s) {
fringe.add(v, Double.POSITIVE_INFINITY);
distTo.put(v, Double.POSITIVE_INFINITY);
}
}
while (!fringe.isEmpty()) {
long v = fringe.removeSmallest();
marked.add(v);
for (WeightedEdge<Long> e : g.neighbors(v)) {
assert v == e.from();
long w = e.to();
// This update is known as "relaxing" an edge e.
if (!marked.contains(w) && e.weight() < distTo.get(w)) {
edgeTo.put(w, v);
distTo.put(w, e.weight());
fringe.changePriority(w, e.weight());
}
}
}
}
}
Why is it necessary to check !marked.contains(w) when considering edge e?
Consider the scenario when e.weight() < distTo.get(w)
. In other words, e
is lighter than the edge we previously used to reach w
. However, we know that when we added w
to the MST earlier via some other edge f
, f
was a minimum-weight crossing edge! Removing f
and replacing it with e
violates the cut property; doing so would disconnect w
from the rest of the MST.
What is the overall order of growth of Prim's algorithm assuming all fringe operations have a runtime in O(log V)?
- V calls to
add
. - V calls to
remove
. - E calls to
changePriority
.
The runtime is in O(V log V + V log V + E log V) = O(E log V) assuming there exists a MST (i.e. E >= V - 2).