Rabin-Karp Algorithm is a string searching algorithm created by Richard M. Karp and Michael O. Rabin that uses hashing to find any one of a set of pattern strings in a text.
A substring of a string is another string that occurs in. For example, ver is a substring of stackoverflow. Not to be confused with subsequence because cover is a subsequence of the same string. In other words, any subset of consecutive letters in a string is a substring of the given string.
In Rabin-Karp algorithm, we’ll generate a hash of our pattern that we are looking for & check if the rolling hash of our text matches the pattern or not. If it doesn’t match, we can guarantee that the pattern doesn’t exist in the text. However, if it does match, the pattern can be present in the text. Let’s look at an example:
Let’s say we have a text: yeminsajid and we want to find out if the pattern nsa exists in the text. To calculate the hash and rolling hash, we’ll need to use a prime number. This can be any prime number. Let’s take prime = 11 for this example. We’ll determine hash value using this formula:
(1st letter) X (prime) + (2nd letter) X (prime)¹ + (3rd letter) X (prime)² X + ......
a -> 1 g -> 7 m -> 13 s -> 19 y -> 25 b -> 2 h -> 8 n -> 14 t -> 20 z -> 26 c -> 3 i -> 9 o -> 15 u -> 21 d -> 4 j -> 10 p -> 16 v -> 22 e -> 5 k -> 11 q -> 17 w -> 23 f -> 6 l -> 12 r -> 18 x -> 24
The hash value of nsa will be:
14 X 11⁰ + 19 X 11¹ + 1 X 11² = 344
Now we find the rolling-hash of our text. If the rolling hash matches with the hash value of our pattern, we’ll check if the strings match or not. Since our pattern has 3 letters, we’ll take 1st 3 letters yem from our text and calculate hash value. We get:
25 X 11⁰ + 5 X 11¹ + 13 X 11² = 1653
This value doesn’t match with our pattern’s hash value. So the string doesn’t exists here. Now we need to consider the next step. To calculate the hash value of our next string emi. We can calculate this using our formula. But that would be rather trivial and cost us more. Instead, we use another technique.
1653 - 25 = 1628.
1628 / 11 = 148.
148 + 9 X 11² = 1237.
The new hash value is not equal to our patterns hash value. Moving on, for n we get:
Previous String: emi First Letter of Previous String: e(5) New Letter: n(14) New String: "min" 1237 - 5 = 1232 1232 / 11 = 112 112 + 14 X 11² = 1806
It doesn’t match. After that, for s, we get:
Previous String: min First Letter of Previous String: m(13) New Letter: s(19) New String: "ins" 1806 - 13 = 1793 1793 / 11 = 163 163 + 19 X 11² = 2462
It doesn’t match. Next, for a, we get: