Knuth-Morris-Pratt (KMP) algorithm

Prakhar Kapoor
3 min readApr 16, 2023

--

The Knuth-Morris-Pratt (KMP) algorithm is a string matching algorithm that solves the problem of finding a substring within a longer string. The algorithm was developed by Donald Knuth, Vaughan Pratt, and James Morris in 1977. It has linear time complexity and is widely used in text processing applications.

The algorithm is based on the observation that when a mismatch occurs while comparing a substring to the main string, some information about the main string can be used to skip over unnecessary comparisons. The KMP algorithm computes a table of partial matches, which is used to determine the next position in the main string to start comparing from after a mismatch occurs.

Algorithm

The KMP algorithm works by maintaining two indices, i and j. Index i is used to traverse the main string, while index j is used to traverse the substring being searched for. The algorithm compares characters in the main string with characters in the substring until a mismatch is found.

When a mismatch is found, the algorithm uses the partial match table to determine the next position in the main string to start comparing from. The partial match table is computed beforehand and contains information about the longest proper suffix of each prefix of the substring.

To compute the partial match table, the algorithm compares each prefix of the substring with its own suffixes. For example, given the substring “ABCDABD”, the algorithm computes the following table:

The longest proper suffix of each prefix is the length of the longest suffix of the prefix that is also a proper suffix of the substring. For example, the longest proper suffix of the prefix “ABCDAB” is “A”.

Once the partial match table is computed, the algorithm can use it to skip over unnecessary comparisons when a mismatch occurs. If the character in the main string at position i does not match the character in the substring at position j, the algorithm looks up the longest proper suffix of the prefix of the substring up to position j-1 in the partial match table. This gives the next position in the substring to start comparing from, which is also the position to start comparing from in the main string.

The algorithm continues comparing characters in the substring and the main string until either a match is found or the end of the substring is reached. If a match is found, the algorithm returns the position in the main string where the match starts. Otherwise, the algorithm returns -1 to indicate that the substring was not found in the main string.

Complexity

The KMP algorithm has linear time complexity, O(m + n), where m is the length of the substring and n is the length of the main string. The partial match table can be computed in O(m) time, and the algorithm compares each character in the main string at most once. Therefore, the overall time complexity is O(m + n).

Conclusion

The Knuth-Morris-Pratt algorithm is a string matching algorithm that solves the problem of finding a substring within a longer string. The algorithm uses a partial match table to skip over unnecessary comparisons when a mismatch occurs, which results in linear time complexity. The KMP algorithm is widely used in text processing applications and is an important tool for developers working with string manipulation.

C++ code

In this code, the computeLPS function computes the partial match table, also known as the longest proper prefix/suffix table, and the kmpSearch function uses it to search for the pattern in the text. The main function demonstrates how to use the kmpSearch function to find the position of a pattern within a text string.

If you enjoyed this post and would like to support my writing, your contributions would be greatly appreciated. You can send your support via UPI at: 6394kp@kotak

--

--