Accounts Merge

Ethan Davis
Data Structures and Algorithms DSA
4 min readJul 2, 2024
Data Structures and Algorithms

Statement

Consider a 2D array accounts, where each row accounts[i] is an array of strings, such that the first string accounts[i][0] is a name while remaining strings are emails associated with that account. Find if multiple accounts belong to the same person by checking if the accounts have the same name and at least one common email address.

Two accounts can have the same name but belong to different people, since different people can have the same name. However, all accounts that belong to one person will have the same name.

The output should be a 2D array, where the first string of each row is the name, and the remaining strings are the merged emails in sorted lexicographical order.

Constraints

  • 1 ≤ accounts.length() ≤ 100
  • 2 ≤ accounts[i].length() ≤ 10
  • 1 ≤ accounts[i][j].length() ≤ 30
  • accounts[i][0] is lowercase English characters
  • accounts[i][j] is a valid email, where 0 < j

Solution

"""
production algorithm
"""


class Solution:
def accounts_merge(self, accounts):
"""
time complexity O(nklog(nk))
space complexity O(nk)
"""
size = len(accounts)
unionfind = UnionFind(size)

# build reverse index from email to account
emails = {}
for index, account in enumerate(accounts):
for email in account[1:]:
if email not in emails:
emails[email] = index
else:
representative = unionfind.find(emails[email])
unionfind.union(index, representative)

# merge accounts
merged = [[] for _ in range(size)]
for email, index in emails.items():
representative = unionfind.find(index)
if not merged[representative]:
merged[representative] = [accounts[representative][0]]
merged[representative].append(email)
merged = [[account[0]] + sorted(account[1:])
for account in merged if account]
return merged


class UnionFind:
def __init__(self, size):
self.parent = [n for n in range(size)]
self.size = [1 for _ in range(size)]

def find(self, x):
"""
time complexity O(⍺(n))
space complexity O(n)
"""
if not self.parent[x] == x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]

def union(self, x, y):
"""
time complexity O(⍺(n))
space complexity O(n)
"""
rootx = self.find(x)
rooty = self.find(y)
if not rootx == rooty:
if self.size[rootx] < self.size[rooty]:
rooty, rootx = rootx, rooty
self.parent[rooty] = rootx
self.size[rootx] += self.size[rooty]
"""
unit test
"""

from unittest import TestCase
from algorithms.union_find.accounts_merge.solution import Solution


class SolutionTestCase(TestCase):
def test_zero_unions(self):
# Given
accounts = [["alice", "alice1@email.com", "alice3@email.com", "alice5@email.com"],
["bob", "bob2@email.com", "bob4@email.com", "bob6@email.com"],
["charlie", "charlie3@email.com", "charlie6@email.com", "charlie9@email.com"]]
solution = Solution()

# When
actual = solution.accounts_merge(accounts)

# Then
expected = [["alice", "alice1@email.com", "alice3@email.com", "alice5@email.com"],
["bob", "bob2@email.com", "bob4@email.com", "bob6@email.com"],
["charlie", "charlie3@email.com", "charlie6@email.com", "charlie9@email.com"]]
self.assertEqual(actual, expected)

def test_unions_to_one_account(self):
# Given
accounts = [["alice", "alice1@email.com", "alice3@email.com", "alice5@email.com"],
["alice", "alice3@email.com",
"alice6@email.com", "alice9@email.com"],
["alice", "alice5@email.com", "alice10@email.com", "alice15@email.com"]]
solution = Solution()

# When
actual = solution.accounts_merge(accounts)

# Then
expected = [["alice", "alice10@email.com", "alice15@email.com", "alice1@email.com",
"alice3@email.com", "alice5@email.com", "alice6@email.com", "alice9@email.com"]]
self.assertEqual(actual, expected)

def test_unions_to_many_accounts(self):
# Given
accounts = [["alice", "alice1@email.com", "alice3@email.com", "alice5@email.com"],
["bob", "bob2@email.com", "bob4@email.com", "bob6@email.com"],
["charlie", "charlie3@email.com",
"charlie6@email.com", "charlie9@email.com"],
["alice", "alice5@email.com",
"alice7@email.com", "alice9@email.com"],
["bob", "bob6@email.com", "bob8@email.com", "bob10@email.com"],
["charlie", "charlie9@email.com", "charlie11@email.com", "charlie13@email.com"]]
solution = Solution()

# When
actual = solution.accounts_merge(accounts)

# Then
expected = [["alice", "alice1@email.com", "alice3@email.com", "alice5@email.com", "alice7@email.com", "alice9@email.com"],
["bob", "bob10@email.com", "bob2@email.com",
"bob4@email.com", "bob6@email.com", "bob8@email.com"],
["charlie", "charlie11@email.com", "charlie13@email.com", "charlie3@email.com", "charlie6@email.com", "charlie9@email.com"]]
self.assertEqual(actual, expected)

Strategy

Union Find.

Explanation

The algorithm uses a UnionFind to merge accounts into disjoint sets. Accounts are indexed in the input 2D array accounts, so more specifically it’s accounts indexes that will be merged into disjoint sets.

First, the algorithm builds a reverse index that maps emails to their account index. If an email is seen multiple times, then the relevant account indexes should be merged, and so the UnionFind unions them. After all emails have been iterated, and unions of disjoint sets made, then the output 2D array is built.

The output 2D is initialized, with length the same as the input 2D array, and where each row is initialized as empty. Then, each mapping from email to account index of the reverse index is iterated. The account index and representatives of disjoint sets of the UnionFind are used to find to which row index of the output 2D array an email should be appended. When accounts have been merged, then emails are sorted, and empty accounts are excluded from the output 2D array.

The result is a union of all accounts that belong to one person.

Time Complexity

The algorithm iterates all emails of all accounts. For each iteration, at most a constant number of UnionFind operations are done. Let n be the number of accounts, and k be the largest number of emails in any account. Then, the time complexity to iterate all emails and union all disjoint sets is O(n × k × ⍺(n)), or simply O(n × k), where ⍺(n) is the inverse Ackerman function and time complexity of UnionFind operations.

Then, the algorithm iterates all emails again. For each iteration, a UnionFind operation is done, and an email is appended to a list in constant time. Then, the emails of all accounts are sorted. At worst, all emails belong to one account, and so the time complexity to build the output 2D array and sort it is O(n × k × log(n × k)).

Overall, the time complexity of the algorithm is O(n × k × log(n × k)).

Space Complexity

The algorithm uses a UnionFind whose auxiliary space is the number of accounts n. The algorithm also uses a reverse index that maps each email to its account index whose auxiliary space is the number of emails n × k. Therefore, the auxiliary space complexity of the algorithm is O(n × k).

Links

--

--