HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

O(B) algorithm to find positions of all permutations of smaller string in a bigger string with length B - how is this possible?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
thispositionsbiggerhowallwithlengthalgorithmpossiblesmaller

Problem

Context: I've been working through Cracking the Code Interview and on page 70 the book asserts that there is a O(B) solution to this problem.

If

  • s = little string and S = len(s)



  • b = big string and B = len(b)



then I believe it would be something along the lines of (pseudocode):

hs = hash(s)
for every position, substring of length S in B: 
    h_sub = hash(substring)
    if hs == h_sub:
         print position


Note that the for loop will run B-S+1 times, so whatever happens inside the loop must be O(1) in order for the whole thing to be O(B).

This assumes that there exists a O(1) hash function hash(s) that only returns the same hash when two words are permutations of each other, and never collide otherwise.

What is this hash function? It seems like you'd have to iterate through every letter in substring to calculate h_sub and I don't see how that can happen in less than O(S) time. This answer seemed promising but then someone found a non-valid collision.

I learned about the radix-2p hash for this problem in school and we had to prove it as a homework exercise so I am convinced of its correctness.

However in the calculation $k = \sum_{i=0}^{n-1} x_i(2^{p})^i$, the exponent on the $2^p$ always changes depending on the position of the letter $x_i$ within the substring, so I don't see how we can avoid having to iterate anew through all S letters of the substring when we calculate $k \text{ mod } (2^p-1)$.

Am I missing something or is there another approach entirely? Thanks.

Solution

I think a hash is the wrong approach. Instead you can use auxiliary variables: an array $A$ and an integer $c$. Basically $A$ tracks the error in the number of occurrences of each character and $c$ tracks the number of non-zero errors. Instead of a hash update you update the count of the character which drops out, then update $c$ if it changed from or to zero, then repeat for the character which is coming in. After that, if $c=0$ you found a permutation of $s$.

Context

StackExchange Computer Science Q#105537, answer score: 2

Revisions (0)

No revisions yet.