Minimum Window Substring
Statement
Given two strings p
and q
find the minimum substring of p
such that it has the following properties:
- It’s the shortest substring of
p
that includes all the characters ofq
- It contains at least the same frequency of each character of
q
- The order of the characters doesn’t matter
Constraints
p
andq
consist of uppercase and lowercase English characters- 1 ≤
p.length()
,q.length()
≤ 10³
Solution
"""
production algorithm
"""
from collections import defaultdict, Counter
class Solution:
def minimum_window_substring(self, string1, string2):
"""
time complexity O(n)
space complexity O(1)
"""
if len(string1) < len(string2):
return str()
m, n = len(string1), len(string2)
low, high = 0, 0
frequencies, visited, total = Counter(string2), defaultdict(int), 0
length, result = float("inf"), str()
# step foward high index
for high in range(m):
if string1[high] in frequencies:
if visited[string1[high]] < frequencies[string1[high]]:
total += 1
visited[string1[high]] += 1
# found match
if total == n:
# step forward low index
while total == n:
if string1[low] in frequencies:
if visited[string1[low]] == frequencies[string1[low]]:
total -= 1
visited[string1[low]] -= 1
low += 1
# found local minimum window substring match
if high - low + 2 < length:
length = high - low + 2
result = string1[low-1:high+1]
return result
"""
unit tests
"""
from unittest import TestCase
from algorithms.sliding_window.minimum_window_substring.solution import Solution
class SolutionTestCase(TestCase):
def test_shorter_first_string(self):
# Given
string1 = "azxeytjmnkwhzbdloifg"
string2 = "azxeytjmnkwhzbdloifgazxeytjmnkwhzbdloifg"
solution = Solution()
# When
actual = solution.minimum_window_substring(string1, string2)
# Then
expected = str()
self.assertEqual(actual, expected)
def test_zero_matches(self):
# Given
string1 = "azxeytjmnkwhzbdloifg"
string2 = "aaaaaaaaaaaaaaaaaaaa"
solution = Solution()
# When
actual = solution.minimum_window_substring(string1, string2)
# Then
expected = str()
self.assertEqual(actual, expected)
def test_one_match(self):
# Given
string1 = "azxeytpzqrwhzbdloifg"
string2 = "pqr"
solution = Solution()
# When
actual = solution.minimum_window_substring(string1, string2)
# Then
expected = "pzqr"
self.assertEqual(actual, expected)
def test_two_matches_return_frist(self):
# Given
string1 = "azpzqrjmnkwhzpzqrifg"
string2 = "pqr"
solution = Solution()
# When
actual = solution.minimum_window_substring(string1, string2)
# Then
expected = "pzqr"
self.assertEqual(actual, expected)
def test_two_matches_return_second(self):
# Given
string1 = "apzqrtjmnkwhzbpqrifg"
string2 = "pqr"
solution = Solution()
# When
actual = solution.minimum_window_substring(string1, string2)
# Then
expected = "pqr"
self.assertEqual(actual, expected)
def test_worst_case(self):
# Given
string1 = "aaaaaaaaaaaaaaaaaaaz"
string2 = "z"
solution = Solution()
# When
actual = solution.minimum_window_substring(string1, string2)
# Then
expected = "z"
self.assertEqual(actual, expected)
Strategy
Sliding Window.
Explanation
Before iteration of the characters of the first string, create character frequency hash maps for the first and second strings. Low and high indexes are used to record a window of characters of the first string throughout iteration.
For each iteration of the characters of the first string, bump the high index, and record the frequency of the character at the high index. If all character frequencies of the second string have been iterated in the window of the first string, then bump the low index, until the character frequencies in the window of the first string and the character frequencies of the second string no longer match. Record the window of characters of the first string if it’s the shortest substring visited so far.
After iteration of the characters of the first string, return the shortest substring recorded.
Time Complexity
At worst the low and high indexes iterate the length of the first string. Therefore the time complexity of the algorithm is O(n)
.
Space Complexity
The auxiliary space complexity of the algorithm is O(1)
since at most 52 characters are kept in the character frequency hash maps.