Link

Disjoint Sets Study Guide

Dynamic Connectivity Problem. The ultimate goal of this lecture was to develop a data type that support the following operations on a fixed number N of objects:

  • connect(int p, int q) (called union in our optional textbook)
  • isConnected(int p, int q) (called connected in our optional textbook)

We do not care about finding the actual path between p and q. We care only about their connectedness. A third operation we can support is a helper method for isConnected:

  • find(int p): find(p) == find(q) if and only if isConnected(p, q)

Formally, we call this abstract data type disjoint sets.

Set representatives. Connectedness is an equivalence relation. Saying that two objects are connected is the same as saying they are in an equivalence class. This is just fancy math talk for saying “every object is in exactly one bucket, and we want to know if two objects are in the same bucket.” When you connect two objects, you’re basically just pouring everything from one bucket into another.

Quick find. This is the most natural solution, where each object is given an explicit number. Uses an array id of length N, where id[i] is the bucket number of item i. To connect two items p and q, we set every item in the same bucket as p to have the bucket for q number. connect takes linear time (with respect to N) but isConnected takes constant time.

Quick union. An alternate approach is to rename our id array as parent and define new invariants based on this idea. In this strategy, parent[i] is the parent of item i. An item can be its own parent. The find method climbs the ladder of parents until it reaches the root, an item whose parent is itself. To connect p and q, we set the root of p to point to the root of q. While this strategy might result in a faster connect (which is sometimes the case), the time it takes to find the root of p and q can be linear with respect to N in the worst case. isConnected also relies on find, so its runtime will also be linear in the worst case.

Weighted quick union. Rather than connect(p, q) always making the root of p point to the root of q, we instead make the root of the smaller tree point to the root of the larger one. We found that, in the worst case, deciding based on size results in trees of same asymptotic height as deciding based on height. Using either metric, weighted quick union trees have a height whose order of growth is logarithmic with respect to N. This height guarantee improves the worst-case running time for connect and isConnected to logarithmic time as well.

Weighted quick union with path compression. When find is called, every node along the way is made to point at the root, resulting in very nearly flat trees. For any reasonable values of N in this universe that we inhabit, the height of the tree will be at most 5.

  1. [Textbook 1.5.10] In a weighted quick union, suppose that we set id[find(p)] to q instead of id[find(q)]. Would the resulting algorithm be correct?
  2. If we’re concerned about tree height, why don’t we use tree height as our weighted quick union metric rather than tree size? What is the worst-case tree height for WQUBySize vs. WQUByHeight? For any sequence of calls, will they always result in exactly the same forest of trees?
  3. Q2 from CS 61B 16sp MT2
  4. Q3 from CS 61B 17sp MT2
  5. Q3 from CS 61B 18sp MT2
  6. Q6 from CS 61B 19sp MT2