(very old draft)
The Rolling Hash
Now that there is a way to remove the first character and append a new last character in constant time to a string hash value, we can also use this hash function suite to perform substring searches in linear time.
To find a substring of length m in a string of length n you might proceed as follows:
start at the first character of the string
compare the substring to the string for m characters.
If same, then the substring is found.
else, repeat from the next character of the string (stop when there are less than m characters left in the string)
In practice this works well since it is unlikely that more than a few character will match each round so the effective time efficiency is much less than O(n**m). But for some strings, such as DNA which have small alphabets, the number of matches in each round could be large.
But by using a rolling hash this can be reduced to linear-time.
Calculate the hash of the substring we are searching for.
Start with a hash of the first m characters of the string
If the hash values match then the substring is found.
else remove the first character of the string from the hash and add the m+1 th character to the hash.
Now the hash is of m characters from the next character of the string.
This process is O(N).
No comments:
Post a Comment
Please use family friendly language.