On Disjoint Sets

Caleb Winston
Index Zero
Published in
7 min readSep 17, 2018

Maybe you’ve heard about merge-find sets, or union-find forests, or perhaps disjoint sets. Surely such a pedantic name is that of a correspondly complicated data structure… right? 🤨

Well… not really — conceiving a data structure that can store disjoint sets is not actually all that hard.

What are sets?

Before we take a look at disjoint sets, what exactly are sets? A set is really just a collection of elements.

A visual representation of a single set with three elements

As you can see from the visualization above, sets don’t maintain an order of elements. We can perform pretty much any operation you can perform on a set of data — addition, deletion, size, etc. — except for operations that require an order of the elements to be known.

For example, a few things sets can’t do are inserting at an “index”, popping, or pushing elements, etc.

Another interesting thing about sets — what happens when we add an element to a set that already exists in it?

What should happen when we add a duplicate element

That’s right — we simply replace the element that was already there; we don’t do anything.

There’s a simple reason for why sets don’t store duplicates. In other data structures such as lists and dictionaries, every element was different in at least one way. Even if two entries had the same value, they had to have different indices or keys. But in a set, we’re just storing the elements by themselves in no particular order. So we can’t distinguish a 3 we add in first from a 3 we add in later.

What are disjoint sets?

Now that you know what sets are, a group of disjoint sets is really just a bunch of sets where each set has a representative element.

A bunch of disjoint sets and their representative elements

That’s really all a data structure holding disjoint sets needs to keep track of — a bunch of elements and the representative element of the set they belong to.

The operations you can perform on disjoint sets are also quite simple -

  • Make a new set with a new representative element
  • Find the representative element of the set an element belongs to
  • Union the sets two elements belong to

The first operation is making a new disjoint set and putting an element in it. This element will be its representative element. This is shown in the below visualization.

Making a new set with the element 4

The second operation is finding the representative element of the set an element belongs to. In a previous picture, we saw that the representative element of the set with {5, 3, 2} is 5. So the representative element of 3 is 5 because 3 is in the set which has 5 as its representative element.

Finding the representative element of the set that contains 4

The third operation is taking the union of the sets that contain two elements. So what’s a union? A union of two sets is what we get when we take all the elements of one set and all the elements of the other set and put them together into the same set. Here’s what that looks like.

Taking the union of two sets

As you can see, we pretty much just took the elements in both sets and put them together in the same set. However there are two other things you should notice.

  1. We kept only one 3 even though there was 1 in both sets. This is something we have to do be because, as we mentioned earlier, we can’t distinguish the 3 from the first set from the 3 from the other set. So instead, we just keep one of them.
  2. We also had to pick a new representative element for the new set. Naturally, we picked one of two representative elements of the original set. But why did we pick the 9 and not the 5? The answer is that it doesn’t really matter that much which one we pick. Depending on how we implement our disjoint set data structure, we might choose to pick either one.

Now that we’ve seen the two things we need to keep track of for disjoint sets (all the elements, the representative elements of the sets they belong to) and the three operations we need to be able to perform (make a new set, find a representative element, and union two sets). Let’s move on to how we would actually implement these sets.

How to implement disjoint sets

So how exactly do we implement a data structure that stores disjoint sets?

One way we can store disjoint sets is with a tree-like structure. For example we can represent the following disjoint sets as follows.

Two disjoint sets represented by trees.

If we use some sort of tree-like structure, all we have to do to make a set is create a new “tree” with the representative element of the new set at the root or the top of the tree. To find the representative element of the set an element belongs to, we simply have to follow the element up the tree till we get to the representative element at the top. To take the union of two sets we simply have to make the root of one set’s tree point to the root of other other set’s tree.

Taking the union of two sets

So it looks like trees are a great way of representing disjoint sets. We can store everything we want to and perform the operations we want to. But what do we actually use to store the trees?

As it turns out arrays can be used quite effectively to store these “disjoint set trees”. We can have an array and each element can correspond to an index in our array and the parent of that element in the tree can correspond to the element at that index in our array. Here’s what that would look like.

An array that stores the tree representation of the disjoint set(s)

As you can see the elements at indices 2, 3, and 7 represented the parents of elements 2, 3, and 7 in the disjoint set tree. They all had a value of 5 because that’s what their parent element is in the tree.

And you can also notice that the element at index 5 was a -1. When we are trying to find the representative element of a set an element belongs to, we can keep moving to the parent of the current element till we get to one with a value of -1. Then we can know to stop because the current element’s index is the representative element.

So just to summarize — here’s how we can perform our three operations on disjoint sets using an array.

  • We can make a new set by simply adding an element to the array at the index of the value we want the set’s representative element to have.
Making a new disjoint set with a representative element of 9
  • We can find the representative element of the set an element belongs to by repeatedly moving a pointer to the parent element of the original element.
Finding an element in our disjoint sets using our array
  • Lastly, we can also take the union of two sets by simply setting the parent value of the representative element of one to set to the value of the representative element of the other set.

How to make disjoint sets work for stuff other than number

Up to this point, we’ve just been looking at disjoint sets where all the elements are numbers. However, it is possible for our disjoint set data structure to support any type of data. The simplest solution is to maintain a dictionary that maps data in the type we want to corresponding numbers in our disjoint sets.

A dictionary that can map from strings to integer indices in our disjoint set array

How to optimize our disjoint set data structure

There are actually a few simple optimizations we can do to speed up the operations our disjoint set data structure can do. We’re going to take a look at the two most common ones.

The first is union-by-height. It turns out that even though we can randomly choose which element to choose as the representative element of the union of two sets, by making the element from the set with a taller tree, we can increase performance.

The second is path compression. When we try to find the representative element of an element’s set, the representative element we return in the end is technically the representative element of every element we come across. So, we can modify our find function to simply set the parent of each element we come across to the representative element we ultimately find.

So that’s the disjoint set data structures. 👏 Disjoint sets aren’t as “general-purpose” ase other data structures out there but they can be very useful in specific and now you know how to implement them! 👍

(If you want to take a look at a coded implementation for one, here is one I did in sixty lines in the Lua programming language)

--

--

Caleb Winston
Index Zero

Building efficient programming systems for cloud and edge computing (calebwin.github.io)