DSA Challenge : Introduction to Rabin Karp String Matching Algorithm

Sumit Dey
Strategio
Published in
4 min readDec 22, 2022

The Rabin-Karp algorithm uses a hash function to search for and match patterns in text. Instead of going through every character in the initial phase like the Naïve String Matching Algorithm does, it filters out the characters that don’t match before performing the comparison.

How Rabin-Karp Algorithm works?

A sequence of characters is taken and checked for the possibility of the presence of the required string. If the possibility is found then, character matching is performed.

  1. Let the text be:
Text

And the string to be searched in the above text be:

Pattern

2. Let’s give each of the characters we’ll be employing in the problem a numerical value(v)/weight. Here, we’ve only used the first ten alphabets (i.e. A to J).

Text Weights

3. n be the length of the pattern and m be the length of the text. Here, m = 10 and n = 3.
Let d be the number of characters in the input set. Here, we have taken input set {A, B, C, ..., J}. So, d = 10. You can assume any suitable value for d.

4.Let us calculate the hash value of the pattern.

Hash value of the text
hash value for pattern(p) = Σ(v * dm-1) mod 13 
= ((3 * 102) + (4 * 101) + (4 * 100)) mod 13
= 344 mod 13
= 6

In the calculation above, choose a prime number (here, 13) in such a way that we can perform all the calculations with single-precision arithmetic.

5. Calculate the hash value for the text-window of size m.

For the first window ABC,
hash value for text(t) = Σ(v * dn-1) mod 13
= ((1 * 102) + (2 * 101) + (3 * 100)) mod 13
= 123 mod 13
= 6

6. Compare the pattern’s hash value to the text’s hash value. Character matching is carried out if they match.
In the above examples, the hash value of the first window (i.e. t) matches with p so, go for character matching between ABC and CDD. Go to the next window since they don’t match.

7. We calculate the hash value of the next window by subtracting the first term and adding the next term as shown below.

t = ((1 * 102) + ((2 * 101) + (3 * 100)) * 10 + (3 * 100)) mod 13 
= 233 mod 13
= 12

In order to optimize this process, we make use of the previous hash value in the following way.

t = ((d * (t - v[character to be removed] * h) + v[character to be added] ) mod 13  
= ((10 * (6 - 1 * 9) + 3 )mod 13
= 12
Where, h = dm-1 = 103-1 = 100.

8. For BCC, t = 12 (6). Therefore, go for the next window.
After a few searches, we will get the match for the window CDA in the text.

Hash value of different windows

Algorithm :

n = t.length
m = p.length
h = dm-1 mod q
p = 0
t0 = 0
for i = 1 to m
p = (dp + p[i]) mod q
t0 = (dt0 + t[i]) mod q
for s = 0 to n - m
if p = ts
if p[1.....m] = t[s + 1..... s + m]
print "pattern found at position" s
If s < n-m
ts + 1 = (d (ts - t[s + 1]h) + t[s + m + 1]) mod q

Rabin-Karp Complexity :

The average case and best case complexity of Rabin-Karp algorithm is O(m + n) and the worst case complexity is O(mn).

The worst-case complexity occurs when spurious hits occur a number for all the windows.

Application :

  • For pattern matching
  • For searching string in a bigger text

--

--