Disjoint Sets
Applying iterative refinement to improve Disjoint Sets: from Quick Find to Quick Union to Weighted Quick Union.
Kevin Lin, with thanks to many others.
1
Ask questions anonymously on Piazza. Look for the pinned Lecture Questions thread.
Feedback from the Reading Quiz
2
Disjoint Sets ADT
V vertices in the disjoint sets ADT.
Kruskal’s loops up to E times.
Calls isConnected each time.
But only up to V - 1 calls to connect.
Goal: O(log V) implementation for both connect and isConnected.
3
public interface DisjointSets {
/** Connects two items P, Q. */
void connect(int p, int q);
/** True if P, Q are connected. */
boolean isConnected(int p, int q);
}
Array Representation with Quick Find Invariants
Before connect(2, 3) operation:
{0, 1, 2, 4}, {3, 5}, {6}
After connect(2, 3) operation:
{0, 1, 2, 4, 3, 5}, {6}
4
0
4
1
2
3
5
6
0
4
1
2
3
5
6
4
4
4
5
4
5
6
0 1 2 3 4 5 6
5
5
5
5
5
5
6
0 1 2 3 4 5 6
id
id
5
Quick Find Analysis
If we have V vertices…
E isConnected calls, each O(1).
V connect calls, each O(V).
Simple graph: E < V2.
Kruskal’s: O(E log V + E + V2)= O(E log V + V2)
Both operations need to be O(log V)!
private int[] id;
boolean isConnected(int p, int q) {
return id[p] == id[q];
}
void connect(int p, int q) {
int setP = id[p];
int setQ = id[q];
for (int i=0; i<id.length; i++) {
if (id[i] == setP) id[i] = setQ;
}
}
Quick Union
6
Improving the connect Operation
Quick Find invariant. For each v, id[v] is the set representative for v.
Quick Union invariant. For each v, parent[v] is the parent of v.
7
0
4
1
2
3
5
6
2
4
4
3
4
3
6
0 1 2 3 4 5 6
parent
Improving the connect Operation
Quick Union invariant. For each v, parent[v] is the parent of v.
Show the result after calling connect(5, 0).
8
0
4
1
2
3
5
6
2
4
4
3
4
3
6
0 1 2 3 4 5 6
Q
parent
Improving the connect Operation
Show the result after calling connect(5, 0).
Set parent[find(5)] = parent[find(0)].
9
0
4
1
2
3
5
6
2
4
4
4
4
3
6
0 1 2 3 4 5 6
A
parent
Worst-Case Height Trees
Spindly tree: repeatedly connect the first item’s tree below the second item’s tree.
connect(4, 3)
connect(3, 2)
connect(2, 1)
connect(1, 0)
Worst-case runtime for both connect and isConnected is Θ(N).
10
3
2
1
0
4
11
Naive Quick Union Analysis
If we have V vertices…
E isConnected calls, each O(V).
V connect calls, each O(V).
Kruskal’s: O(E log V + EV + V2)= O(E log V + EV + V2)= O(EV + V2)
Worst case is slower than Quick Find!
private int find(int p) {
while (p != parent[p]) p = parent[p];
return p;
}boolean isConnected(int p, int q) {
return find(p) == find(q);
}
void connect(int p, int q) {
int i = find(p);
int j = find(q);
parent[i] = j;
}
12
Naive Quick Union Analysis
Hypothesis (from B-Trees). Unbalanced growth leads to worst-case height trees.
Identify (different due to parent pointers). When connecting, the second item’s tree always becomes the new root.
Plan. Choose the new root based on a metric such as tree height.
private int find(int p) {
while (p != parent[p]) p = parent[p];
return p;
}boolean isConnected(int p, int q) {
return find(p) == find(q);
}
void connect(int p, int q) {
int i = find(p);
int j = find(q);
parent[i] = j;
}
Weighted Quick Union
13
Weighted Quick Union by Height
Quick Union invariant. For each v, parent[v] is the parent of v.
The result of connect(5, 0) and connect(0, 5) should be the same!
14
0
4
1
2
3
5
3
5
0
4
1
2
0
4
1
2
3
5
H = 2
H = 1
H = 2
H = 3
Describe how to construct a worst-case height tree given a weighted quick union by height.
15
WQUByHeight: Worst-Case Height Tree
16
A
Size = 4, Height = 2
0
1
2
3
0
1
2
3
Size = 8, Height = 3
0
1
2
3
4
5
6
7
Size = 16, Height = 4
17
WQUByHeight Analysis
Keep track of heights with an extra array.
Worst-case height is log(V)!
E isConnected calls, each O(log V).
V connect calls, each O(log V).
Kruskal’s: O(E log V + E log V + V log V)= O(E log V + V log V)= O(E log V) if E > V
void connect(int p, int q) {
int i = find(p);
int j = find(q);
if (i == j) return;
if (height[i] < height[j])
parent[i] = j;
else if (height[i] > height[j])
parent[j] = i;
else { // heights are equal
parent[j] = i;
height[i] += 1;
}
}
Weighted Quick Union: isConnected(15, 10)
18
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
Weighted Quick Union with Path Compression
19
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
Tie all visited nodes to the root.
Same asymptotic runtime.
Weighted Quick Union with Path Compression
20
Tie all visited nodes to the root.
Same asymptotic runtime.
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
Weighted Quick Union with Path Compression
21
Tie all visited nodes to the root.
Draw result of isConnected(14, 13).
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
Q
Weighted Quick Union with Path Compression
22
Tie all visited nodes to the root.
Draw result of isConnected(14, 13).
15
11
5
12
13
6
1
7
14
2
9
10
3
0
4
A
8
Weighted Quick Union with Path Compression
23
Can’t keep of track heights anymore.
15
11
5
12
13
6
1
7
14
2
9
10
3
0
4
8
Size = 4, Height = 2
0
1
2
3
0
1
2
3
Size = 8, Height = 3
0
1
2
3
4
5
6
7
Size = 16, Height = 4
WQUBySize: Worst-Case Height Tree
24
Worst-case analysis still works when we track subtree size, rather than subtree height!
25
WQUBySize Analysis
Keep track of sizes with an extra array.
Worst-case height is log(V)!
E isConnected calls, each O(log V).
V connect calls, each O(log V).
Kruskal’s: O(E log V + E log V + V log V)= O(E log V + V log V)= O(E log V) if E > V
void connect(int p, int q) {
int i = find(p);
int j = find(q);
if (i == j) return;
if (size[i] < size[j]) {
parent[i] = j;
size[j] += size[i];
} else {
parent[j] = i;
size[i] += size[j];
}
}
26
WQUPathCompression
WQUBySize with Path Compression.
Worst-case height is log*(V), where log* is the iterated logarithm–nearly constant.
E isConnected calls, each O(log* V).
V connect calls, each O(log* V).
e.g. log*(265536) = 5.
Analysis is out of scope.
Kruskal’s: O(E log V + E log* V + V log* V)= O(E log V) if E > V
private int find(int p) {
int root = p;
while (root != parent[root])
root = parent[root];
while (p != root) {
int newP = parent[p];
parent[p] = root;
p = newP;
}
return root;
}
Summary
Disjoint Sets ADT is used to track connected components in Kruskal’s algorithm.
Graph algorithm runtime can depend on efficient data structure implementations.
Quick Find: Array representation with no tree structure. Fast isConnected, slow connect.
Quick Union: Array representation with tree structure. Worst-case linear-height trees.
Weighted Quick Union: Choose the new root strategically based on a metric.
WQUByHeight: Use subtree height as a metric. Results in log V height.
WQUBySize: Use subtree size as a metric. Results in log V height.
WQUPathCompression: Use subtree size as a metric. Results in log* V height–nearly constant.
27
What is your burning question from today’s lecture?
28