Union-Find data structure in short for Revision

3 min readSep 21, 2023

The Union-Find data structure, also known as the Disjoint-Set data structure, is a fundamental data structure used to manage a collection of disjoint (non-overlapping) sets. It primarily supports two main operations: union and find. This data structure is particularly useful in solving problems that involve partitioning elements into distinct sets and efficiently answering questions about the relationships between these sets.

Here’s a detailed explanation of the Union-Find data structure:

Initialization:

Initially, each element is considered as a separate set. This means that if you have N elements, you have N individual sets, each containing just one element. In other words, every element is its own representative.

Find(x):

The Find(x) operation is used to determine which set an element x belongs to and find the representative element (or the "parent") of that set.
The Find operation starts at element x and follows a chain of "parent" pointers until it reaches a representative element. This representative element is unique for each set and serves as an identifier for that set.
This operation helps answer questions like “Is element A in the same set as element B?”

Union(x, y):

The Union(x, y) operation is used to merge two sets into one. It connects the set containing element x with the set containing element y.
To perform a union, you typically find the representatives of the sets containing x and y using the Find operation. Then, you make one of them the parent of the other, effectively merging the two sets.
This operation helps answer questions like “Can I connect elements A and B without forming a cycle in a graph?”

The Union-Find data structure is often implemented using arrays. Two commonly used techniques to optimize its performance are:

Union by Rank: In a union operation, you attach the smaller tree (set) to the root of the larger tree to keep the tree heights balanced. This helps maintain efficient Find operations.
Path Compression: During a Find operation, you update the parent pointers along the path to the root to make subsequent Find operations faster. This effectively flattens the tree structure.

Pseudocode using Path Compression:

Initialize:
  For each element x:
    MakeSet(x)  // Each element starts as its own set with rank 0 and itself as the representative.

Find(x):
  if x is not the representative of its set:
    Update parent of x to Find(parent[x])  // Path compression
  return parent[x]

Union(x, y):
  root_x := Find(x)
  root_y := Find(y)

  if root_x ≠ root_y:
    // Attach the smaller rank tree to the root of the larger rank tree
    if rank[root_x] < rank[root_y]:
      parent[root_x] := root_y
    else if rank[root_x] > rank[root_y]:
      parent[root_y] := root_x
    else:
      parent[root_y] := root_x  // Arbitrarily attach y to x's root
      rank[root_x] := rank[root_x] + 1

Implementation:

'''
createSet: O(1) time | O(1) space

For General: 
    find: O(n) time | O(1) space
    union: O(n) time | O(1) space
For union by ranks: 
    find: O(log(n)) time | O(1) space
    union: O(log(n)) time | O(1) space
For union by ranks: 
    a means alpha
    find: O(a(n)) time | O(a(n)) space
    union: O(a(n)) time | O(a(n)) space
'''
class UnionFind:
    def __init__(self):
        self.parents = {}
        # For union by rank and path compression
        self.ranks = {}

    def createSet(self, value):
        self.parents[value] = value
        self.ranks[value] = 0

    def find(self, value):
        if value not in self.parents:
            return None
        # # for general and union by rank
        # while value != self.parents[value]:
        #     value = self.parents[value]
        # return value

        # for path compression
        if value != self.parents[value]: # not the root node
            # reassign parent to current parent's root
            self.parents[value] = self.find(self.parents[value])
            
        return self.parents[value]

    def union(self, valueOne, valueTwo):
        if valueOne not in self.parents or valueTwo not in self.parents:
            return
        valueOneRoot = self.find(valueOne)
        valueTwoRoot = self.find(valueTwo)
        # # for general
        # self.parents[valueOneRoot] = valueTwoRoot

        # for union by ranks and path compression
        if self.ranks[valueOneRoot] < self.ranks[valueTwoRoot]:
            self.parents[valueOneRoot] = valueTwoRoot
        elif self.ranks[valueOneRoot] > self.ranks[valueTwoRoot]:
            self.parents[valueTwoRoot] = valueOneRoot
        else:
            self.parents[valueTwoRoot] = valueOneRoot
            self.ranks[valueOneRoot] += 1

Conclusion:

The Union-Find data structure is an essential tool in computer science and algorithm design, particularly for problems involving set partitioning and connectivity.

Applications of the Union-Find data structure include:

Disjoint Set Operations: Quickly determine if two elements are in the same set or not.
Cycle Detection: Detect cycles in graphs, often used in algorithms like Kruskal’s Minimum Spanning Tree algorithm.
Dynamic Connectivity: Maintain connected components in dynamic graphs, useful in network connectivity problems.

Union-Find data structure in short for Revision

Written by Mohammad Shoaib