Knuth-Morris-Pratt Algorithm Implementation Part-2

Monisha Mathew
2 min readMar 22, 2019

--

Continuing from where we left off in Part-1

The next step we would like to work on is implementing the main method that does the actual computation. This meet seem slightly overwhelming when we try to focus on every detail. Let’s build the solution iterative fashion.

Step 1: Straight forward case.

Haystack — “hello”

Needle — “hello”

This should return 0 as the result.

public int strStr(String haystack, String needle) {
if(needle==null || needle.length()==0 || haystack==null || haystack.equals(needle)){
return 0;
}
//TODO - remaining cases yet to be handled HashMap<Integer, Integer> skipTable = getSkipTable(needle);
char[] chars = haystack.toCharArray();
char[] pat = needle.toCharArray();
int skip = 0;
//start index in the haystack
for(int i = 0; i<chars.length; ){
int temp = i;
for(int j = skip; j<pat.length && i<chars.length; j++){
if(chars[i]==pat[j]){
if(j==pat.length-1){
return i - pat.length+1;
}
i++;
}
//TODO - remaining cases, yet to be handled
}
}
return -1;
}

Step 2: Now consider the case where the pattern occurs after an offset.

Haystack — “zhello”

Needle — “hello”

This should return 1 as the result.

public int strStr(String haystack, String needle) {
if(needle==null || needle.length()==0 || haystack==null || haystack.equals(needle)){
return 0;
} else if(needle.length()>haystack.length()){
//Handling the case where the needle size //is greater than haystack size
return -1;
}
HashMap<Integer, Integer> skipTable = getSkipTable(needle);
char[] chars = haystack.toCharArray();
char[] pat = needle.toCharArray();
int skip = 0;
//start index in the haystack
for(int i = 0; i<chars.length; ){
int temp = i;
for(int j = skip; j<pat.length && i<chars.length; j++){
if(chars[i]==pat[j]){
if(j==pat.length-1){
return i - pat.length+1;
}
i++;
} else if(j>0){
//skipping redundant comparisons
skip = skipTable.get(j-1);
break;
} else {
//when you need to start comparing from 0
skip = 0;
i = temp+1;
break;
}
}
}
return -1;
}

Although the accuracy is achieved here, certain cases are not handles efficiently, such as a haystack (of really large size) matches all the characters of the needle except for its last character.

Haystack — “aaaaaa…….aaaa”

Needle — “aaaaaa…….aaab” (same length as haystack)

This case particularly becomes more complex for evaluation because the skip table calculation becomes excessively cumbersome. Let’s look into this in another post.

Find more posts here.

Cheers & Chao!

--

--