Group Anagrams
Statement
Given a list of words or phrases, group the words or phrases that are anagrams of each other.
Constraints
- 1 ≤
strings.length()
≤ 10³ - 0 ≤
strings[i].length()
≤ 100 strings[i]
contains lowercase English characters
Solution
"""
production algorithm
"""
from collections import defaultdict
class Solution:
def group_anagrams(self, strings):
"""
time complexity O(nk)
space complexity O(nk)
"""
anagrams = defaultdict(list)
for string in strings:
# find frequencies of characters
frequencies = [0 for _ in range(26)]
for character in string:
frequencies[ord(character) - ord("a")] += 1
# group strings by frequencies of characters
anagrams[tuple(frequencies)].append(string)
return anagrams.values()
"""
unit tests
"""
from unittest import TestCase
from algorithms.counters.group_anagrams.solution import Solution
class SolutionTestCase(TestCase):
def test_zero_multiple_string_groups(self):
# Given
strings = ["apple", "banana", "window", "orange",
"pillow", "chair", "ocean", "garden"]
solution = Solution()
# When
groups = solution.group_anagrams(strings)
actual = {frozenset(group) for group in groups}
# Then
expected = {frozenset({"apple"}), frozenset({"banana"}), frozenset({"window"}), frozenset({"orange"}),
frozenset({"pillow"}), frozenset({"chair"}), frozenset({"ocean"}), frozenset({"garden"})}
self.assertEqual(actual, expected)
def test_some_multiple_string_groups(self):
# Given
strings = ["apple", "banana", "window", "orange",
"pillow", "chair", "ocean", "garden",
"elppa", "wodniw", "wollip", "naeco"]
solution = Solution()
# When
groups = solution.group_anagrams(strings)
actual = {frozenset(group) for group in groups}
# Then
expected = {frozenset({"apple", "elppa"}), frozenset({"banana"}), frozenset({"window", "wodniw"}), frozenset({"orange"}),
frozenset({"pillow", "wollip"}), frozenset({"chair"}), frozenset({"ocean", "naeco"}), frozenset({"garden"})}
self.assertEqual(actual, expected)
Strategy
Counters.
Explanation
The algorithm iterates all strings. For each string, the frequency of each character is counted. Then, using the character frequencies as a key of a hash map, the string is appended to a list of strings with the same character frequencies. The result is all strings with the same character frequencies, i.e. that are anagrams, are grouped together.
Time Complexity
Let n
be the number of strings, and k
be the length of the longest string. The algorithm iterates each character of each string. The algorithm also iterates a list of frequencies which has a constant length 26 since the characters of strings are lowercase English characters. Therefore, the time complexity of the algorithm is O(n × k)
.
Space Complexity
At the end of iteration, the hash map holds every string. Let n
be the number of strings, and k
be the length of the longest string. Therefore, the auxiliary space complexity of the algorithm is O(n × k)
.