Minimum Window Substring

Ethan Davis
Data Structures and Algorithms DSA
3 min readMay 25, 2024
Data Structures and Algorithms

Statement

Given two strings p and q find the minimum substring of p such that it has the following properties:

  1. It’s the shortest substring of p that includes all the characters of q
  2. It contains at least the same frequency of each character of q
  3. The order of the characters doesn’t matter

Constraints

  • p and q consist of uppercase and lowercase English characters
  • 1 ≤ p.length(), q.length() ≤ 10³

Solution

"""
production algorithm
"""

from collections import defaultdict, Counter


class Solution:
def minimum_window_substring(self, string1, string2):
"""
time complexity O(n)
space complexity O(1)
"""
if len(string1) < len(string2):
return str()

m, n = len(string1), len(string2)
low, high = 0, 0
frequencies, visited, total = Counter(string2), defaultdict(int), 0
length, result = float("inf"), str()

# step foward high index
for high in range(m):
if string1[high] in frequencies:
if visited[string1[high]] < frequencies[string1[high]]:
total += 1
visited[string1[high]] += 1

# found match
if total == n:

# step forward low index
while total == n:
if string1[low] in frequencies:
if visited[string1[low]] == frequencies[string1[low]]:
total -= 1
visited[string1[low]] -= 1
low += 1

# found local minimum window substring match
if high - low + 2 < length:
length = high - low + 2
result = string1[low-1:high+1]

return result
"""
unit tests
"""

from unittest import TestCase
from algorithms.sliding_window.minimum_window_substring.solution import Solution


class SolutionTestCase(TestCase):
def test_shorter_first_string(self):
# Given
string1 = "azxeytjmnkwhzbdloifg"
string2 = "azxeytjmnkwhzbdloifgazxeytjmnkwhzbdloifg"
solution = Solution()

# When
actual = solution.minimum_window_substring(string1, string2)

# Then
expected = str()
self.assertEqual(actual, expected)

def test_zero_matches(self):
# Given
string1 = "azxeytjmnkwhzbdloifg"
string2 = "aaaaaaaaaaaaaaaaaaaa"
solution = Solution()

# When
actual = solution.minimum_window_substring(string1, string2)

# Then
expected = str()
self.assertEqual(actual, expected)

def test_one_match(self):
# Given
string1 = "azxeytpzqrwhzbdloifg"
string2 = "pqr"
solution = Solution()

# When
actual = solution.minimum_window_substring(string1, string2)

# Then
expected = "pzqr"
self.assertEqual(actual, expected)

def test_two_matches_return_frist(self):
# Given
string1 = "azpzqrjmnkwhzpzqrifg"
string2 = "pqr"
solution = Solution()

# When
actual = solution.minimum_window_substring(string1, string2)

# Then
expected = "pzqr"
self.assertEqual(actual, expected)

def test_two_matches_return_second(self):
# Given
string1 = "apzqrtjmnkwhzbpqrifg"
string2 = "pqr"
solution = Solution()

# When
actual = solution.minimum_window_substring(string1, string2)

# Then
expected = "pqr"
self.assertEqual(actual, expected)

def test_worst_case(self):
# Given
string1 = "aaaaaaaaaaaaaaaaaaaz"
string2 = "z"
solution = Solution()

# When
actual = solution.minimum_window_substring(string1, string2)

# Then
expected = "z"
self.assertEqual(actual, expected)

Strategy

Sliding Window.

Explanation

Before iteration of the characters of the first string, create character frequency hash maps for the first and second strings. Low and high indexes are used to record a window of characters of the first string throughout iteration.

For each iteration of the characters of the first string, bump the high index, and record the frequency of the character at the high index. If all character frequencies of the second string have been iterated in the window of the first string, then bump the low index, until the character frequencies in the window of the first string and the character frequencies of the second string no longer match. Record the window of characters of the first string if it’s the shortest substring visited so far.

After iteration of the characters of the first string, return the shortest substring recorded.

Time Complexity

At worst the low and high indexes iterate the length of the first string. Therefore the time complexity of the algorithm is O(n).

Space Complexity

The auxiliary space complexity of the algorithm is O(1) since at most 52 characters are kept in the character frequency hash maps.

Links

--

--