Top K Frequent Elements
Statement
Given an array of integers numbers
, and an integer k
, return the k
most frequent integers.
Constraints
- 1 ≤
numbers.length()
≤ 10³ - -10⁴ ≤
numbers[i]
≤ 10⁴ - 1 ≤
k
≤ number of unique integers
Solution
"""
production algorithm
"""
from collections import Counter
from data_structures.heap.heap import Heap
class Solution:
def top_k_frequent(self, numbers, k):
"""
time complexity O(n + mlogk)
space complexity O(m + k)
"""
counter = Counter(numbers)
smallest = Heap()
for number, count in counter.items():
smallest.push(MinHeapData(count, number))
if k < smallest.size():
smallest.pop()
result = list()
for _ in range(k):
_, number = smallest.pop()
result.append(number)
return result
class MinHeapData:
def __init__(self, count, number):
self.count = count
self.number = number
def __lt__(self, other):
if self.count == other.count:
return self.number < other.number
return self.count < other.count
def __iter__(self):
yield self.count
yield self.number
def __len__(self):
return 2
"""
unit tests
"""
from unittest import TestCase
from algorithms.top_k_elements.top_k_frequent_elements.solution import Solution
class SolutionTestCase(TestCase):
def test_no_equal_frequencies(self):
# Given
numbers = [51, 53, 53, 55, 55, 55, 57, 57, 57, 57, 59, 59, 59, 59, 59]
k = 3
solution = Solution()
# When
actual = solution.top_k_frequent(numbers, k)
# Then
expected = [55, 57, 59]
self.assertEqual(actual, expected)
def test_some_equal_frequencies(self):
# Given
numbers = [51, 51, 51, 53, 53, 53, 53, 53, 55,
55, 55, 57, 57, 57, 57, 57, 59, 59, 59]
k = 3
solution = Solution()
# When
actual = solution.top_k_frequent(numbers, k)
# Then
expected = [59, 53, 57]
self.assertEqual(actual, expected)
Strategy
Top K Elements.
Explanation
First, create a counter of the frequency of each integer. Next, sort the integers by their frequency in a min heap. When the size of the min heap reaches k+1
, then pop the top of the min heap. The top of the min heap is the kth
most frequent integer, and so the min heap holds the top k
frequent integers.
Time Complexity
The time complexity of the algorithm depends on the number of integers n
, the number of unique integers m
, and the size of the min heap which is proportional to k
. The time complexity to count the frequency of each integer is O(n)
. Then, there are m
iterations of operations on the min heap with size proportional to k
, which has time complexity O(mlogk)
.
The overall time complexity of the algorithm depends on the relationship of these variables n
, m
, and k
. Therefore, the time complexity of the algorithm is O(n + mlogk)
.
Space Complexity
The auxiliary space complexity of the algorithm depends on the size of the character frequency counter m
, and the size of the min heap which is proportional to k
. Therefore, the auxiliary space complexity of the algorithm is O(m + k)
.