Photo by Simon Fitall on Unsplash

13. Disjoint Set (Union/Find)

-

Yohan Hyunsung Yi
5 min readJun 7, 2018

--

A disjoint set is a data structure that stores and manipulates information about elements that are divided into non-overlapping subsets. Disjoint sets are often used to partition constituent elements so that they do not overlap when there is an entire set. Let’s take a look at some of these terms.

  • A set is a set of objects. (Unlike lists and the order is not considered)
  • When all the elements of set A are included in set B, A is called a subset of B. Here B is called A superset of A.
  • A and B are mutually disjoint when there is no element shared by A and B.
  • Partitioning an arbitrary set means to split the set so that each subset is a disjoint set satisfying the following two properties. (1) When the partitioned subset is combined, it becomes the original set. (2) Partitioned subsets are mutually disjoint, ie there are no overlapping elements.

For example, if S = 1,2,3,4 and A = 1,2, B = 3,4, C = 2,3,4, D = 4 then A and B are partitions of S. A and B are disjoint sets. But A and C are not partitions of S. This is because there are overlapping elements. A and D are also not partitions of S. It is because it does not become S when both are combined.

Operations

  • make-set (x): Makes a new set with x as its only element.
  • union (x, y): Combines the set to which x belongs and the set to which y belongs.
  • find (x): Returns the representative value (root node value) of the set to which x belongs.

Let’s take an example. Let’s say that a disjoint set of A = 3, 4 is already created. In this case, the first entry element, 3, becomes the root node, and 3 is the value representing A. It is as follows.

find (4) means to print the representative value of the set to which 4 belongs. So in this example, it will be 3. Similarly, find (3) and the output is 3.

Let’s say we have another disjoint set, B = 1,2. The representative value of B is the root node 3. When joining A and B, the root nodes are connected. It is as follows.

The root node of the newly created disjoint set is 1. So, for example, if you perform find (4) to output a representative value of a set containing 4, the result is 1.

Implemented as an array

Let’s implement the three basic operations of a disjoint set as an array. Let A = 3,4, B = 1,2 Let’s say that two disjoint sets have already been created.

If you implement it as an array: N refers to the input elements. S indicates whether the input element is the root node or where the parent node is. For example, 3 and 1 are root nodes, so the value of S corresponding to them is 0. The parent node of 4 is marked as the first element of N, that is, 3. Similarly, the parent node of 2 is the third element of N,

N = [3,4,1,2]
S = [0, 1, 0, 3]
If union operation is performed, S changes as follows.

S=[0,1,1,3]

So what is the computational complexity of the union (x, y) operation? There are three main stages to consider.

  1. You need to find the disjoint set that x belongs to: find (x)
  2. You need to find the set to which y belongs: find (y)
  3. Combine the found sets.

However, the computational complexity of the find operation is O (logn) when the number of elements in the disjoint set is n. Repeatedly traversing the parent node to find the root node. For example, if there is a disjoint set consisting of 3–4–5–2–1 in a row from the leaf node to the root, find (3) must traverse the edge four times as high as the tree, I can find it. The path compression is the way to perform the find operation more efficiently, but I will take a little time to deal with it.

Let’s say you’ve run the find operation twice to find the join set. Now it is time to combine these two. There are union-by-size and union-by-height methods to combine, which we’ll cover in the next section.

Union

When combining any two disjoint sets, it is efficient to combine sets with fewer elements into many sets of subtrees (union-by-size). Similarly, a set with a small height of the tree must be combined into a large set of subtrees (union-by-height). In order to improve the efficiency of the find operation, it is necessary to perform the find operation in the next union operation. The number of elements and the height of the tree tend to be proportional and the computational complexity of the find operation is highly dependent on them.

Implementing union-by-size and union-by-height is straightforward. You can change the root node information of array S. In the root node, we replace it with the following, unlike the old one,

  • union-by-size : −size of tree
  • union-by-height : −height of tree

In short, the two disjoint sets found in the find operation are compared in terms of the number of atoms or the height of the disjoint sets. This allows subsequent find operations to be performed more efficiently.

The computational complexity of union-by-size and union-by-height is O (1). Because find has already found the root node of two disjoint sets in the find operation, it compares the number of atoms or heights stored in these two root node locations in S. Replace the value of S corresponding to the root node of the smaller of the two to point to the index of the larger root node. All these operations correspond to O (1).

Path compression

path compression is to make all nodes point to the root as follows: S stores the root node instead of the parent node index. When you perform a find operation, you need to go back as far as the height of the tree to find the root node. This is to alleviate this inefficiency. Once you perform path compression, you can reduce the computational complexity of the find operation that finds the root node.

--

--