HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Knuth–Morris–Pratt string match algorithm

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
morrismatchprattalgorithmknuthstring

Problem

The Knuth–Morris–Pratt string search algorithm is described in the paper Fast Pattern Matching in Strings (SIAM J. Computing vol. 6 no. 2, June 1977). The initial step of the algorithm is to compute the next table, defined as follows:


The pattern-matching process will run efficiently if we have an auxiliary table that tells us exactly how far to slide the pattern, when we detect a mismatch at its jth character pattern[j]. Let next[j] be the character position in the pattern which should be checked next after such a mismatch, so that we are sliding the pattern j − next[j] places relative to the text.

The authors give the example of the pattern abcabcacab. If there is a mismatch at j=7:

abcabcacab
abcabca?


Then the pattern should be moved 3 places to the right and matching should continue with the 4th character of the pattern:

abcabcacab
abcabca?


so next[7] = 4. In some cases we know we can skip the mismatched character entirely, for example if there is a mismatch at j=3:

abcabcacab
abc?


then the search should continue from the character after the mismatch:

abcabcacab
abc?


These special cases are indicated by next[j] = −1.

(If you're reading the paper, note that the authors use indexes starting at 1 as in Fortran, but the Python convention is to use indexes starting at 0, so that's what I'm giving here.)

This is the code that computes the next table. Please review.

def findPattern(pattern):

    j = -1
    next = [-1] * len(pattern)
    i = 0 # next[0] is always -1, by KMP definition

    while (i+1 < len(pattern)):
        if (j == -1) or (pattern[j] == pattern[i]):
            i += 1
            j += 1
            if pattern[i] != pattern[j]:
                next[i] = j
            else:
                next[i] = next[j]
        else:
            j = next[j]

    return next

if __name__ == "__main__":

    print findPattern("aaaab")
    print findPattern("abaabc")


Output:

```
[-1, -1, -1, -

Solution


  1. Review



-
There's no docstring.

-
There's no need for parentheses around conditions (Python is not C), so instead of:

while (i+1 < len(pattern)):


you can write:

while i+1 < len(pattern):


-
The loop while i+1

-
The
or operator has lower precedence than comparison operators, so instead of:

if (j == -1) or (pattern[j] == pattern[i]):


you can omit the parentheses:

if j == -1 or pattern[j] == pattern[i]:


-
When there's a choice about whether to test for equality or inequality, then I think it's usually clearer to test for equality, so I would write
if pattern[i] == pattern[j] instead of if pattern[i] != pattern[j].

-
There's a small inefficiency in your code. If the test
j == -1 or pattern[j] == pattern[i] passes then you set j = next[j] and go round the while loop again. But the condition on the while loop is a condition on i, which has not changed, so you waste the test. It is better to go straight to the test on j, like this:

m = len(pattern)
while i + 1  -1 and pattern[i] != pattern[j]:
        j = next[j]
    i += 1
    j += 1
    if pattern[i] == pattern[j]:
        next[i] = next[j]
    else:
        next[i] = j


-
After making this change,
i always increases by 1 on each iteration of the main loop, so we could use a for` loop instead to make this clear.

  1. Revised code



def kmp_table(pattern):
    """Compute the "next" table corresponding to pattern, for use in the
    Knuth-Morris-Pratt string search algorithm.

    """
    m = len(pattern)
    next = [-1] * m
    j = -1
    for i in range(1, m):
        while j > -1 and pattern[i-1] != pattern[j]:
            j = next[j]
        j += 1
        if pattern[i] != pattern[j]:
            next[i] = j
        else:
            next[i] = next[j]
    return next

Code Snippets

while (i+1 < len(pattern)):
while i+1 < len(pattern):
if (j == -1) or (pattern[j] == pattern[i]):
if j == -1 or pattern[j] == pattern[i]:
m = len(pattern)
while i + 1 < m
    while j > -1 and pattern[i] != pattern[j]:
        j = next[j]
    i += 1
    j += 1
    if pattern[i] == pattern[j]:
        next[i] = next[j]
    else:
        next[i] = j

Context

StackExchange Code Review Q#107909, answer score: 7

Revisions (0)

No revisions yet.