Wednesday 13 April 2022

Simple Rolling Hash- Part 3

(very old draft) The Rolling Hash

Now that there is a way to remove the first character and append a new last character in constant time to a string hash value, we can also use this hash function suite to perform substring searches in linear time.

To find a substring of length m in a string of length n you might proceed as follows:
start at the first character of the string
compare the substring to the string for m characters.
If same, then the substring is found.
else, repeat from the next character of the string (stop when there are less than m characters left in the string)

In practice this works well since it is unlikely that more than a few character will match each round so the effective time efficiency is much less than O(n**m). But for some strings, such as DNA which have small alphabets, the number of matches in each round could be large.

But by using a rolling hash this can be reduced to linear-time.

Calculate the hash of the substring we are searching for.
Start with a hash of the first m characters of the string
If the hash values match then the substring is found.
else remove the first character of the string from the hash and add the m+1 th character to the hash.
Now the hash is of m characters from the next character of the string.

This process is O(N).

No comments:

Post a Comment

Please use family friendly language.