patternMinor
Finding hash of a substring $[i, j]$ in $O(1)$ using $O(|S|)$ pre computation
Viewed 0 times
computationsubstringhashpreusingfinding
Problem
Given a string $S$ of length $n$ characters, is it possible to calculate the hash of its substring $[i, j]$ (from index $i$ to $j$, both inclusive) in $O(1)$ using some form of precomputation? Can we use a modification of the rolling hash?
The following is a similar problem to my question. In it the string was given in a compressed form. Example: if the string is
The data was stored in an
Note that hashing was used in the solutions, which checked if the given substring is a palindrome or not. Given a substring of string $S$ as $(i, j)$, they computed the hash of substring $[i, (i + j) / 2]$ and the reverse hash of substring $[(i+j+2)/2, j]$ and checked if they were equal or not. So if they wanted to check if in string
Code for the hashing concept
The hash precomputation was done as follows:
And then the hash was computed using the following functions :
```
/ Calculates the Forward hash of substring [i,j] /
unsigned long long CalculateHash(int i, int j)
{
if(i > j)
return -1;
unsigned long long ret = pre_hash[j] - POW(very_large_prime, [2 (j - i + 1)])pre_hash[i - 1];
return ret;
}
/ Calculates the reverse hash of substring [i,j] /
unsigned long l
The following is a similar problem to my question. In it the string was given in a compressed form. Example: if the string is
"aaabccdeeee" then the compressed form is:3 a
1 b
2 c
1 d
4 eThe data was stored in an
str[] array as : str[] = [{'a','3'}, {'b','1'}, {'c','2'}....]Note that hashing was used in the solutions, which checked if the given substring is a palindrome or not. Given a substring of string $S$ as $(i, j)$, they computed the hash of substring $[i, (i + j) / 2]$ and the reverse hash of substring $[(i+j+2)/2, j]$ and checked if they were equal or not. So if they wanted to check if in string
S = "daabac" whether substring $[1, 5]$ is a a palindrome or not, they computed the following :h1 = forward_hash("aa")
h2 = reverse_hash("ba")
h1 == h2Code for the hashing concept
The hash precomputation was done as follows:
/* Computing the Prefix Hash Table */
pre_hash[0] = 0;
for(int i = 1;i <= len(str); i++)
{
pre_hash[i] = pre_hash[i - 1] * very_large_prime + str[i].first;
pre_hash[i] = pre_hash[i] * very_large_prime + str[i].second;
}
/* Computing the Suffix Hash Table */
suff_hash[0] = 0;
for(int i = 1;i <= len(str); i++)
{
suff_hash[i] = suff_hash[i - 1] * very_large_prime + str[K - i + 1].first;
suff_hash[i] = suff_hash[i] * very_large_prime + str[K - i + 1].second;
}And then the hash was computed using the following functions :
```
/ Calculates the Forward hash of substring [i,j] /
unsigned long long CalculateHash(int i, int j)
{
if(i > j)
return -1;
unsigned long long ret = pre_hash[j] - POW(very_large_prime, [2 (j - i + 1)])pre_hash[i - 1];
return ret;
}
/ Calculates the reverse hash of substring [i,j] /
unsigned long l
Solution
Yes, you can solve your problem, roughly speaking, by precomputing prefix sums.
In particular, if you are willing to do $O(n)$ precomputation and to use $O(n)$ space, we can solve your problem, for most standard rolling hashes. We just need the rolling hash to have one extra property, the inverse property. Most rolling hashes do have this property, including the well-known Rabin-Karp rolling hash.
Let me explain the details, below.
Some notation: let $S[i..j]$ denote the substring of $S$ from index $i$ to $j-1$ (note: it doesn't include index $j$; it is a substring of length $j-i$). Let $H(S[i..j])$ denote the rolling hash of that substring.
The intuition
Imagine that we used a very simple hash, which is just the sum of the bytes modulo 256:
$$H(S[i..j]) = S[i] + S[i+1] + S[i+2] + \dots + S[j-1] \bmod 256.$$
This is a crummy hash function (it has lots of collisions), but let's go with it, since the purpose is just to get some intuition. This is a rolling hash; given the hash of a prefix of the string, it is easy to extend it by one more byte (just add that byte to the hash value you've got so far).
Now with this hash function, we could easily solve your problem. During the precomputation, we'd fill in an array $A[]$ so that $A[j] = S[0] + S[1] + \dots + S[j-1] \bmod 256$ for each $j$; this can be done in $O(n)$ time. Now, we can compute $H(S[i..j])$ via the simple relation
$$H(S[i..j]) = A[j] - A[i] \bmod 256.$$
This can be computed in $O(1)$ time, given the precomputed array $A$.
Of course, you'd never use this in practice, because it's such a crummy hash function. But the same idea can be generalized to work with many other rolling hash functions, which you would want to use. (Intuitively, that's because they are based upon a binary operation that is associative and has an inverse, which is all you need for this trick to work.) I'll describe the details next.
The inverse property
All rolling hashes have the following property:
The accumulation property: Given $H(S[i..j])$ and $S[j]$, we can compute $H(S[i..j+1])$.
We need one additional property from the rolling hash:
The inverse property: Given $H(S[i..j])$ and $H(S[i..k])$, where $j < k$, it should be possible to derive $H(S[j+1..k])$.
Many standard rolling hashes do have this inverse property. For instance, the Rabin-Karp rolling hash has the inverse property. The same is true for hashing with cyclic polynomials (Buzhash).
How to use the inverse property for your problem
Create an array $A[]$ of $n$ elements. During precomputation, we will fill in the elements of $A$ so that $A[j] = H(S[0..j])$. This precomputation can be done in $O(n)$.
Now, at some later point, you would like to compute the rolling hash of $S[i..j]$, i.e., to compute $H(S[i..j])$. How do you do that? It turns out it is easy. First, look up $H(S[0..i])$ and $H(S[0..j])$ using the array; this involves reading the value of $A[i]$ and $A[j]$. Next, use the inverse property to compute $H(S[i..j])$ from these two values. Done! That's all there is to it.
In particular, if you are willing to do $O(n)$ precomputation and to use $O(n)$ space, we can solve your problem, for most standard rolling hashes. We just need the rolling hash to have one extra property, the inverse property. Most rolling hashes do have this property, including the well-known Rabin-Karp rolling hash.
Let me explain the details, below.
Some notation: let $S[i..j]$ denote the substring of $S$ from index $i$ to $j-1$ (note: it doesn't include index $j$; it is a substring of length $j-i$). Let $H(S[i..j])$ denote the rolling hash of that substring.
The intuition
Imagine that we used a very simple hash, which is just the sum of the bytes modulo 256:
$$H(S[i..j]) = S[i] + S[i+1] + S[i+2] + \dots + S[j-1] \bmod 256.$$
This is a crummy hash function (it has lots of collisions), but let's go with it, since the purpose is just to get some intuition. This is a rolling hash; given the hash of a prefix of the string, it is easy to extend it by one more byte (just add that byte to the hash value you've got so far).
Now with this hash function, we could easily solve your problem. During the precomputation, we'd fill in an array $A[]$ so that $A[j] = S[0] + S[1] + \dots + S[j-1] \bmod 256$ for each $j$; this can be done in $O(n)$ time. Now, we can compute $H(S[i..j])$ via the simple relation
$$H(S[i..j]) = A[j] - A[i] \bmod 256.$$
This can be computed in $O(1)$ time, given the precomputed array $A$.
Of course, you'd never use this in practice, because it's such a crummy hash function. But the same idea can be generalized to work with many other rolling hash functions, which you would want to use. (Intuitively, that's because they are based upon a binary operation that is associative and has an inverse, which is all you need for this trick to work.) I'll describe the details next.
The inverse property
All rolling hashes have the following property:
The accumulation property: Given $H(S[i..j])$ and $S[j]$, we can compute $H(S[i..j+1])$.
We need one additional property from the rolling hash:
The inverse property: Given $H(S[i..j])$ and $H(S[i..k])$, where $j < k$, it should be possible to derive $H(S[j+1..k])$.
Many standard rolling hashes do have this inverse property. For instance, the Rabin-Karp rolling hash has the inverse property. The same is true for hashing with cyclic polynomials (Buzhash).
How to use the inverse property for your problem
Create an array $A[]$ of $n$ elements. During precomputation, we will fill in the elements of $A$ so that $A[j] = H(S[0..j])$. This precomputation can be done in $O(n)$.
Now, at some later point, you would like to compute the rolling hash of $S[i..j]$, i.e., to compute $H(S[i..j])$. How do you do that? It turns out it is easy. First, look up $H(S[0..i])$ and $H(S[0..j])$ using the array; this involves reading the value of $A[i]$ and $A[j]$. Next, use the inverse property to compute $H(S[i..j])$ from these two values. Done! That's all there is to it.
Context
StackExchange Computer Science Q#28163, answer score: 4
Revisions (0)
No revisions yet.