Union-Find data structure in short for Revision
The Union-Find data structure, also known as the Disjoint-Set data structure, is a fundamental data structure used to manage a collection of disjoint (non-overlapping) sets. It primarily supports two main operations: union and find. This data structure is particularly useful in solving problems that involve partitioning elements into distinct sets and efficiently answering questions about the relationships between these sets.
Here’s a detailed explanation of the Union-Find data structure:
Initialization:
- Initially, each element is considered as a separate set. This means that if you have N elements, you have N individual sets, each containing just one element. In other words, every element is its own representative.
Find(x):
- The
Find(x)
operation is used to determine which set an elementx
belongs to and find the representative element (or the "parent") of that set. - The Find operation starts at element
x
and follows a chain of "parent" pointers until it reaches a representative element. This representative element is unique for each set and serves as an identifier for that set. - This operation helps answer questions like “Is element A in the same set as element B?”
Union(x, y):
- The
Union(x, y)
operation is used to merge two sets into one. It connects the set containing elementx
with the set containing elementy
. - To perform a union, you typically find the representatives of the sets containing
x
andy
using the Find operation. Then, you make one of them the parent of the other, effectively merging the two sets. - This operation helps answer questions like “Can I connect elements A and B without forming a cycle in a graph?”
The Union-Find data structure is often implemented using arrays. Two commonly used techniques to optimize its performance are:
- Union by Rank: In a union operation, you attach the smaller tree (set) to the root of the larger tree to keep the tree heights balanced. This helps maintain efficient Find operations.
- Path Compression: During a Find operation, you update the parent pointers along the path to the root to make subsequent Find operations faster. This effectively flattens the tree structure.
Pseudocode using Path Compression:
Initialize:
For each element x:
MakeSet(x) // Each element starts as its own set with rank 0 and itself as the representative.
Find(x):
if x is not the representative of its set:
Update parent of x to Find(parent[x]) // Path compression
return parent[x]
Union(x, y):
root_x := Find(x)
root_y := Find(y)
if root_x ≠ root_y:
// Attach the smaller rank tree to the root of the larger rank tree
if rank[root_x] < rank[root_y]:
parent[root_x] := root_y
else if rank[root_x] > rank[root_y]:
parent[root_y] := root_x
else:
parent[root_y] := root_x // Arbitrarily attach y to x's root
rank[root_x] := rank[root_x] + 1
Implementation:
'''
createSet: O(1) time | O(1) space
For General:
find: O(n) time | O(1) space
union: O(n) time | O(1) space
For union by ranks:
find: O(log(n)) time | O(1) space
union: O(log(n)) time | O(1) space
For union by ranks:
a means alpha
find: O(a(n)) time | O(a(n)) space
union: O(a(n)) time | O(a(n)) space
'''
class UnionFind:
def __init__(self):
self.parents = {}
# For union by rank and path compression
self.ranks = {}
def createSet(self, value):
self.parents[value] = value
self.ranks[value] = 0
def find(self, value):
if value not in self.parents:
return None
# # for general and union by rank
# while value != self.parents[value]:
# value = self.parents[value]
# return value
# for path compression
if value != self.parents[value]: # not the root node
# reassign parent to current parent's root
self.parents[value] = self.find(self.parents[value])
return self.parents[value]
def union(self, valueOne, valueTwo):
if valueOne not in self.parents or valueTwo not in self.parents:
return
valueOneRoot = self.find(valueOne)
valueTwoRoot = self.find(valueTwo)
# # for general
# self.parents[valueOneRoot] = valueTwoRoot
# for union by ranks and path compression
if self.ranks[valueOneRoot] < self.ranks[valueTwoRoot]:
self.parents[valueOneRoot] = valueTwoRoot
elif self.ranks[valueOneRoot] > self.ranks[valueTwoRoot]:
self.parents[valueTwoRoot] = valueOneRoot
else:
self.parents[valueTwoRoot] = valueOneRoot
self.ranks[valueOneRoot] += 1
Conclusion:
The Union-Find data structure is an essential tool in computer science and algorithm design, particularly for problems involving set partitioning and connectivity.
Applications of the Union-Find data structure include:
- Disjoint Set Operations: Quickly determine if two elements are in the same set or not.
- Cycle Detection: Detect cycles in graphs, often used in algorithms like Kruskal’s Minimum Spanning Tree algorithm.
- Dynamic Connectivity: Maintain connected components in dynamic graphs, useful in network connectivity problems.