Efficient Operations on Disjoint Sets

Raghav Agarwal
Nybles
Published in
5 min readJul 27, 2021

The disjoint sets data structure is used to store the partitions of a set into disjoint sets. In this article, we are going to discuss how to efficiently implement and use this data structure.

What are disjoint sets?

Simply they are collections of items, such that every item is unique and belongs to only one set.

For example- The sets A = {1, 2, 6}, B = {3, 4, 5} and C = {7, 8} are disjoint sets.

Another way of getting disjoint sets is when a set is partitioned into subsets. Here the set, E = {1, 2, 3, 4, 5, 6, 7, 8, 9} was partitioned into subsets {1, 6} {4, 7, 8} {2, 5} {1, 3, 9} which are disjoint

Partitioning a set into disjoint subsets

Why are they important?

Say you are given a graph and are asked to tell whether some vertices are connected? Easy enough, but what if there are more such queries?

This rather famous question perfectly illustrates the usefulness of a data structure that can do operations on disjoint sets.

But wait, doesn’t c++ already have sets in the standard template library that can insert elements, find them or even remove them in O( log n) time? Couldn’t we just make a vector of sets to represent disjoint sets?

Well yes, but the data structure we are going to cover in this article is much, much faster than the stl::set, though it is more limited in its versatility.

The Disjoint Set data structure can be used to:

  • Join two disjoint sets
  • Find the set ID (or set representative) of an element

Basic Implementation

We will be representing our set using a tree, for example, the set A = {1, 2, 3} can be represented as any of the following

The same set is represented by different trees

Each of these trees represents the same set, the root of the tree is used to identify the set. Thus two elements with the same tree root would belong to the same set.

To join two disjoint sets, we need to connect their trees. It would be best to connect the trees at their roots to keep the depth to a minimum. For example-

Example of Join operation

Now to represent these trees in memory, we can simply use an array storing the parent of each element. For example-

Say we have 6 elements, partitioned into 2 disjoint sets, {1, 2, 3} and {4, 5, 6}. Then they can be represented in memory as-

Representing the tree in memory

Now we are ready to implement this in code-

In this implementation, we initially create singletons sets for each element, as the parent of each element is the element itself.

The time complexity of union and get_root this implementation is O(n) as the tree can form long chains. For example-

A degenerate case of long-chain tree

Optimizations

The simple implementation can be optimized using two simple observations-

  • The tree will not degenerate into a chain if we always add the root of the smaller tree to the bigger tree’s root. We call this Join by size
  • We can further decrease the height of the tree by updating the parent of all elements traversed during the get root operation to the root itself. That is we connect all the elements that are between the element whose root we are finding and the root, and re-connect them directly to the root. We call this Path Compression

Let’s look at both these optimizations in more detail-

Join by Size

Let’s see how join by size will change the tree formed-

Example of improvement due to Join by size

To implement this we would use another array keeping track of the size of the set, using the root element as the index. Then we will simply set the parent of the smaller tree’s root to the larger tree’s root

With this improvement, the time complexity of the get root operation improves from O(n) to O( log n) time.

Path Compression

Path compression example

To implement path compression we need to keep track of all the intermediate nodes in the path to the tree’s root. Here we do this using a vector.

We could also do this with more finesse using recursion

Path compression also reduces the time complexity to O(log(n)).

Using both path compression and join by size, we massively reduce the average time complexity to a near-constant time for each call of get root. (Actual time complexity is O(ɑ(n)), where ɑ is the inverse Ackermann function, which grows so slowly that it is nearly constant)

Here is the implementation with both of these optimizations.

Further Resources

This article is in no way exhaustive on the topic of Disjoint Set data structure. It only offers the topics relevant to competitive coding. The reader can check out the following links to learn more topics like Join by rank, coin flip join, etc.

Moreover, solving at least the Codeforces question set linked below should ingrain the concept of the Disjoint Set. Do not hesitate to refer to the questions’ editorials to solve them.

Study

Questions

--

--