Top K Frequent Elements

Ethan Davis
Data Structures and Algorithms DSA
2 min readApr 3, 2024
Data Structures and Algorithms

Statement

Given an array of integers numbers, and an integer k, return the k most frequent integers.

Constraints

  • 1 ≤ numbers.length() ≤ 10³
  • -10⁴ ≤ numbers[i] ≤ 10⁴
  • 1 ≤ k ≤ number of unique integers

Solution

"""
production algorithm
"""

from collections import Counter
from data_structures.heap.heap import Heap


class Solution:
def top_k_frequent(self, numbers, k):
"""
time complexity O(n + mlogk)
space complexity O(m + k)
"""
counter = Counter(numbers)
smallest = Heap()

for number, count in counter.items():
smallest.push(MinHeapData(count, number))
if k < smallest.size():
smallest.pop()

result = list()
for _ in range(k):
_, number = smallest.pop()
result.append(number)

return result


class MinHeapData:
def __init__(self, count, number):
self.count = count
self.number = number

def __lt__(self, other):
if self.count == other.count:
return self.number < other.number
return self.count < other.count

def __iter__(self):
yield self.count
yield self.number

def __len__(self):
return 2
"""
unit tests
"""

from unittest import TestCase
from algorithms.top_k_elements.top_k_frequent_elements.solution import Solution


class SolutionTestCase(TestCase):
def test_no_equal_frequencies(self):
# Given
numbers = [51, 53, 53, 55, 55, 55, 57, 57, 57, 57, 59, 59, 59, 59, 59]
k = 3
solution = Solution()

# When
actual = solution.top_k_frequent(numbers, k)

# Then
expected = [55, 57, 59]
self.assertEqual(actual, expected)

def test_some_equal_frequencies(self):
# Given
numbers = [51, 51, 51, 53, 53, 53, 53, 53, 55,
55, 55, 57, 57, 57, 57, 57, 59, 59, 59]
k = 3
solution = Solution()

# When
actual = solution.top_k_frequent(numbers, k)

# Then
expected = [59, 53, 57]
self.assertEqual(actual, expected)

Strategy

Top K Elements.

Explanation

First, create a counter of the frequency of each integer. Next, sort the integers by their frequency in a min heap. When the size of the min heap reaches k+1, then pop the top of the min heap. The top of the min heap is the kth most frequent integer, and so the min heap holds the top k frequent integers.

Time Complexity

The time complexity of the algorithm depends on the number of integers n, the number of unique integers m, and the size of the min heap which is proportional to k. The time complexity to count the frequency of each integer is O(n). Then, there are m iterations of operations on the min heap with size proportional to k, which has time complexity O(mlogk).

The overall time complexity of the algorithm depends on the relationship of these variables n, m, and k. Therefore, the time complexity of the algorithm is O(n + mlogk).

Space Complexity

The auxiliary space complexity of the algorithm depends on the size of the character frequency counter m, and the size of the min heap which is proportional to k. Therefore, the auxiliary space complexity of the algorithm is O(m + k).

Links

--

--