Knuth-Morris-Pratt Algorithm

Will Strickland
smucs
Published in
4 min readMar 31, 2021

The Knuth-Morris-Pratt(KMP) Algorithm, is a pattern searching algorithm that improves on its predecessor, the Naive Pattern Searching Algorithm. Before delving into the KMP Algorithm, let us discuss the Naive Pattern Searching Algorithm to help understand how the KMP Algorithm differs, and what makes it more efficient.

The Predecessor

The Naive Algorithm is, thanks to its’ telling name, almost exactly what you think it is. When first thinking about what one would do to find a pattern in a group of text, or at least what I first thought of, is to begin with the first character of the main text, check the subsequent characters until either the pattern is found, or not, and then slide over one character and begin this process again. Well, this is precisely what the Naive Algorithm does. This works well when the first character of the pattern being searched for does not exist in the main text, because the algorithm will simply check each letter once, resulting in a best-case O(n) runtime. On the other hand, this algorithm works extremely poorly when all of the characters of the main text are the same and when only the last character is different. This is because the Naive Algorithm will be forced to scan over each letter a number of times. This results in a worst-case runtime of O(m(n-m+1)), with m being the size of the pattern being searched for, and n being the size of the main text. Situations like these are what make the KMP Algorithm much more efficient.

The KMP Algorithm

The basis of the KMP Algorithm is that whenever a mismatch is detected i.e, the algorithm finds a character not in the pattern, it already knows some of the characters in the next window, this way you do not have to scan over characters multiple times. This is somewhat difficult to explain without an example-

Since we already knew the first three characters, we only had to check if the last character of the new window from the main text matched the last character of the pattern.

From the above example, it is apparent that we will need to know how many characters to skip in the next window. To handle this, a separate array is created in order to determine how many characters to skip. This array is always the same size as the pattern that is being searched for. Let’s call the array lps, which stands for longest proper prefix which is also a suffix. A proper prefix is simply any prefix that does not contain the entire string. This means lps[i] will hold the longest proper prefix of the pattern from 0 to i, which is also a suffix of the pattern from 0 to i. This is another thing that is rather hard to visualize without examples-

This last example is the most helpful when trying to understand how to fill the lps array. This is the most vital part of the KMP Algorithm, so after fully grasping the above, we are ready for the pseudocode.

Examples

The following are some examples in C++ demonstrating the efficiency of the KMP Algorithm compared to the efficiency of the Naive Algorithm. The implementations of these algorithms can be found in the GitHub Repository. The Algorithms have been altered to keep track of the number of comparisons in order to help illustrate the differences between the two.

Output:

Total number of comparisons for KMP: 31
Total comparisons for naive: 70
Total number of comparisons for KMP: 5
Total comparisons for naive: 70
Total number of comparisons for KMP: 3
Total comparisons for naive: 19

Further Reading

If you are interested in reading and learning more about the KMP Algorithm, you can visit these links below.

KMP Algorithm GeeksForGeeks

Techie Delight

References

“KMP Algorithm for Pattern Searching.” GeeksforGeeks, 24 Mar. 2021, www.geeksforgeeks.org/kmp-algorithm-for-pattern-searching/.

“Naive Algorithm for Pattern Searching.” GeeksforGeeks, 30 Oct. 2020, www.geeksforgeeks.org/naive-algorithm-for-pattern-searching/.

“C++ Program to Implement Knuth–Morris–Pratt Algorithm (KMP).” Sanfoundry, 6 July 2017, www.sanfoundry.com/cpp-program-implement-kruth-morris-patt-algorithm-kmp/.

“Knuth–Morris–Pratt Algorithm.” Wikipedia, Wikimedia Foundation, 8 Feb. 2021, en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm.

--

--