Union-Find data structure in short for Revision

Mohammad Shoaib
3 min readSep 21, 2023

--

The Union-Find data structure, also known as the Disjoint-Set data structure, is a fundamental data structure used to manage a collection of disjoint (non-overlapping) sets. It primarily supports two main operations: union and find. This data structure is particularly useful in solving problems that involve partitioning elements into distinct sets and efficiently answering questions about the relationships between these sets.

Here’s a detailed explanation of the Union-Find data structure:

Initialization:

  • Initially, each element is considered as a separate set. This means that if you have N elements, you have N individual sets, each containing just one element. In other words, every element is its own representative.

Find(x):

  • The Find(x) operation is used to determine which set an element x belongs to and find the representative element (or the "parent") of that set.
  • The Find operation starts at element x and follows a chain of "parent" pointers until it reaches a representative element. This representative element is unique for each set and serves as an identifier for that set.
  • This operation helps answer questions like “Is element A in the same set as element B?”

Union(x, y):

  • The Union(x, y) operation is used to merge two sets into one. It connects the set containing element x with the set containing element y.
  • To perform a union, you typically find the representatives of the sets containing x and y using the Find operation. Then, you make one of them the parent of the other, effectively merging the two sets.
  • This operation helps answer questions like “Can I connect elements A and B without forming a cycle in a graph?”

The Union-Find data structure is often implemented using arrays. Two commonly used techniques to optimize its performance are:

  • Union by Rank: In a union operation, you attach the smaller tree (set) to the root of the larger tree to keep the tree heights balanced. This helps maintain efficient Find operations.
  • Path Compression: During a Find operation, you update the parent pointers along the path to the root to make subsequent Find operations faster. This effectively flattens the tree structure.

Pseudocode using Path Compression:

Initialize:
For each element x:
MakeSet(x) // Each element starts as its own set with rank 0 and itself as the representative.

Find(x):
if x is not the representative of its set:
Update parent of x to Find(parent[x]) // Path compression
return parent[x]

Union(x, y):
root_x := Find(x)
root_y := Find(y)

if root_x ≠ root_y:
// Attach the smaller rank tree to the root of the larger rank tree
if rank[root_x] < rank[root_y]:
parent[root_x] := root_y
else if rank[root_x] > rank[root_y]:
parent[root_y] := root_x
else:
parent[root_y] := root_x // Arbitrarily attach y to x's root
rank[root_x] := rank[root_x] + 1

Implementation:

'''
createSet: O(1) time | O(1) space

For General:
find: O(n) time | O(1) space
union: O(n) time | O(1) space
For union by ranks:
find: O(log(n)) time | O(1) space
union: O(log(n)) time | O(1) space
For union by ranks:
a means alpha
find: O(a(n)) time | O(a(n)) space
union: O(a(n)) time | O(a(n)) space
'''
class UnionFind:
def __init__(self):
self.parents = {}
# For union by rank and path compression
self.ranks = {}

def createSet(self, value):
self.parents[value] = value
self.ranks[value] = 0

def find(self, value):
if value not in self.parents:
return None
# # for general and union by rank
# while value != self.parents[value]:
# value = self.parents[value]
# return value

# for path compression
if value != self.parents[value]: # not the root node
# reassign parent to current parent's root
self.parents[value] = self.find(self.parents[value])

return self.parents[value]

def union(self, valueOne, valueTwo):
if valueOne not in self.parents or valueTwo not in self.parents:
return
valueOneRoot = self.find(valueOne)
valueTwoRoot = self.find(valueTwo)
# # for general
# self.parents[valueOneRoot] = valueTwoRoot

# for union by ranks and path compression
if self.ranks[valueOneRoot] < self.ranks[valueTwoRoot]:
self.parents[valueOneRoot] = valueTwoRoot
elif self.ranks[valueOneRoot] > self.ranks[valueTwoRoot]:
self.parents[valueTwoRoot] = valueOneRoot
else:
self.parents[valueTwoRoot] = valueOneRoot
self.ranks[valueOneRoot] += 1

Conclusion:

The Union-Find data structure is an essential tool in computer science and algorithm design, particularly for problems involving set partitioning and connectivity.

Applications of the Union-Find data structure include:

  • Disjoint Set Operations: Quickly determine if two elements are in the same set or not.
  • Cycle Detection: Detect cycles in graphs, often used in algorithms like Kruskal’s Minimum Spanning Tree algorithm.
  • Dynamic Connectivity: Maintain connected components in dynamic graphs, useful in network connectivity problems.

--

--