Link
Disjoint Sets
Applying iterative refinement to improve Disjoint Sets: from Quick Find to Quick Union to Weighted Quick Union.
Kevin Lin, with thanks to many others.
1
Ask questions anonymously on Piazza. Look for the pinned Lecture Questions thread.

Feedback from the Reading Quiz
2

Disjoint Sets ADT
V vertices in the disjoint sets ADT.
Kruskal’s loops up to E times.
Calls isConnected each time.
But only up to V - 1 calls to connect.

Goal: O(log V) implementation for both connect and isConnected.
3
public interface DisjointSets {
  /** Connects two items P, Q. */
  void connect(int p, int q);

  /** True if P, Q are connected. */
  boolean isConnected(int p, int q);
}

Array Representation with Quick Find Invariants
Before connect(2, 3) operation:
{0, 1, 2, 4},    {3, 5},  {6}
After connect(2, 3) operation:
{0, 1, 2, 4, 3, 5},       {6}
4
0
4
1
2
3
5
6
0
4
1
2
3
5
6
4
4
4
5
4
5
6
0  1  2  3  4  5  6
5
5
5
5
5
5
6
0  1  2  3  4  5  6
id
id

5
Quick Find Analysis
If we have V vertices…
E isConnected calls, each O(1).
V connect calls, each O(V).

Simple graph: E < V2.
Kruskal’s: O(E log V + E + V2)= O(E log V + V2)

Both operations need to be O(log V)!
private int[] id;
boolean isConnected(int p, int q) {
  return id[p] == id[q];
}
void connect(int p, int q) {
  int setP = id[p];
  int setQ = id[q];
  for (int i=0; i<id.length; i++) {
    if (id[i] == setP)      id[i] = setQ;
  }
}

Quick Union
6

Improving the connect Operation
Quick Find invariant. For each v, id[v] is the set representative for v.
Quick Union invariant. For each v, parent[v] is the parent of v.
7
0
4
1
2
3
5
6
2
4
4
3
4
3
6
0  1  2  3  4  5  6
parent

Improving the connect Operation
Quick Union invariant. For each v, parent[v] is the parent of v.
Show the result after calling connect(5, 0).
8
0
4
1
2
3
5
6
2
4
4
3
4
3
6
0  1  2  3  4  5  6
Q
parent

Improving the connect Operation
Show the result after calling connect(5, 0).
Set parent[find(5)] = parent[find(0)].
9
0
4
1
2
3
5
6
2
4
4
4
4
3
6
0  1  2  3  4  5  6
A
parent

Worst-Case Height Trees
Spindly tree: repeatedly connect the first item’s tree below the second item’s tree.
connect(4, 3)
connect(3, 2)
connect(2, 1)
connect(1, 0)

Worst-case runtime for both connect and isConnected is Θ(N).
10
3
2
1
0
4

11
Naive Quick Union Analysis
If we have V vertices…
E isConnected calls, each O(V).
V connect calls, each O(V).

Kruskal’s: O(E log V + EV + V2)= O(E log V + EV + V2)= O(EV + V2)

Worst case is slower than Quick Find!
private int find(int p) {
  while (p != parent[p])    p = parent[p];
  return p;
}boolean isConnected(int p, int q) {
  return find(p) == find(q);
}
void connect(int p, int q) {
  int i = find(p);
  int j = find(q);
  parent[i] = j;
}

12
Naive Quick Union Analysis
Hypothesis (from B-Trees). Unbalanced growth leads to worst-case height trees.

Identify (different due to parent pointers). When connecting, the second item’s tree always becomes the new root.

Plan. Choose the new root based on a metric such as tree height.
private int find(int p) {
  while (p != parent[p])    p = parent[p];
  return p;
}boolean isConnected(int p, int q) {
  return find(p) == find(q);
}
void connect(int p, int q) {
  int i = find(p);
  int j = find(q);
  parent[i] = j;
}

Weighted Quick Union
13

Weighted Quick Union by Height
Quick Union invariant. For each v, parent[v] is the parent of v.
The result of connect(5, 0) and connect(0, 5) should be the same!

14
0
4
1
2
3
5
3
5
0
4
1
2
0
4
1
2
3
5
H = 2
H = 1
H = 2
H = 3

Describe how to construct a worst-case height tree given a weighted quick union by height.
15

WQUByHeight: Worst-Case Height Tree
16
A
Size = 4, Height = 2
0
1
2
3
0
1
2
3
Size = 8, Height = 3
0
1
2
3
4
5
6
7
Size = 16, Height = 4

17
WQUByHeight Analysis
Keep track of heights with an extra array.

Worst-case height is log(V)!
E isConnected calls, each O(log V).
V connect calls, each O(log V).

Kruskal’s: O(E log V + E log V + V log V)= O(E log V + V log V)= O(E log V) if E > V
void connect(int p, int q) {
  int i = find(p);
  int j = find(q);
  if (i == j) return;
  if (height[i] < height[j])
    parent[i] = j;
  else if (height[i] > height[j])
    parent[j] = i;
  else { // heights are equal
    parent[j] = i;
    height[i] += 1;
  }
}

Weighted Quick Union: isConnected(15, 10)
18
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4

Weighted Quick Union with Path Compression
19
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
Tie all visited nodes to the root.
Same asymptotic runtime.

Weighted Quick Union with Path Compression
20
Tie all visited nodes to the root.
Same asymptotic runtime.
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4

Weighted Quick Union with Path Compression
21
Tie all visited nodes to the root.
Draw result of isConnected(14, 13).
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
Q

Weighted Quick Union with Path Compression
22
Tie all visited nodes to the root.
Draw result of isConnected(14, 13).
15
11
5
12
13
6
1
7
14
2
9
10
3
0
4
A
8

Weighted Quick Union with Path Compression
23
Can’t keep of track heights anymore.
15
11
5
12
13
6
1
7
14
2
9
10
3
0
4
8

Size = 4, Height = 2
0
1
2
3
0
1
2
3
Size = 8, Height = 3
0
1
2
3
4
5
6
7
Size = 16, Height = 4
WQUBySize: Worst-Case Height Tree
24
Worst-case analysis still works when we track subtree size, rather than subtree height!

25
WQUBySize Analysis
Keep track of sizes with an extra array.

Worst-case height is log(V)!
E isConnected calls, each O(log V).
V connect calls, each O(log V).

Kruskal’s: O(E log V + E log V + V log V)= O(E log V + V log V)= O(E log V) if E > V
void connect(int p, int q) {
  int i = find(p);
  int j = find(q);
  if (i == j) return;
  if (size[i] < size[j]) {
    parent[i] = j;
    size[j] += size[i];
  } else {
    parent[j] = i;
    size[i] += size[j];
  }
}

26
WQUPathCompression
WQUBySize with Path Compression.
Worst-case height is log*(V), where log* is the iterated logarithm–nearly constant.
E isConnected calls, each O(log* V).
V connect calls, each O(log* V).
e.g. log*(265536) = 5.
Analysis is out of scope.
Kruskal’s: O(E log V + E log* V + V log* V)= O(E log V) if E > V
private int find(int p) {
  int root = p;
  while (root != parent[root])
    root = parent[root];
  while (p != root) {
    int newP = parent[p];
    parent[p] = root;
    p = newP;
  }
  return root;
}

Summary
Disjoint Sets ADT is used to track connected components in Kruskal’s algorithm.
Graph algorithm runtime can depend on efficient data structure implementations.
Quick Find: Array representation with no tree structure. Fast isConnected, slow connect.
Quick Union: Array representation with tree structure. Worst-case linear-height trees.
Weighted Quick Union: Choose the new root strategically based on a metric.
WQUByHeight:	Use subtree height as a metric. Results in log V height.
WQUBySize:	Use subtree size as a metric. Results in log V height.
WQUPathCompression:			Use subtree size as a metric. Results in log* V height–nearly constant.
27

What is your burning question from today’s lecture?
28