patternpythonMinor
Knuth–Morris–Pratt string match algorithm
Viewed 0 times
morrismatchprattalgorithmknuthstring
Problem
The Knuth–Morris–Pratt string search algorithm is described in the paper Fast Pattern Matching in Strings (SIAM J. Computing vol. 6 no. 2, June 1977). The initial step of the algorithm is to compute the next table, defined as follows:
The pattern-matching process will run efficiently if we have an auxiliary table that tells us exactly how far to slide the pattern, when we detect a mismatch at its
The authors give the example of the pattern
Then the pattern should be moved 3 places to the right and matching should continue with the 4th character of the pattern:
so
then the search should continue from the character after the mismatch:
These special cases are indicated by
(If you're reading the paper, note that the authors use indexes starting at 1 as in Fortran, but the Python convention is to use indexes starting at 0, so that's what I'm giving here.)
This is the code that computes the next table. Please review.
Output:
```
[-1, -1, -1, -
The pattern-matching process will run efficiently if we have an auxiliary table that tells us exactly how far to slide the pattern, when we detect a mismatch at its
jth character pattern[j]. Let next[j] be the character position in the pattern which should be checked next after such a mismatch, so that we are sliding the pattern j − next[j] places relative to the text.The authors give the example of the pattern
abcabcacab. If there is a mismatch at j=7:abcabcacab
abcabca?Then the pattern should be moved 3 places to the right and matching should continue with the 4th character of the pattern:
abcabcacab
abcabca?so
next[7] = 4. In some cases we know we can skip the mismatched character entirely, for example if there is a mismatch at j=3:abcabcacab
abc?then the search should continue from the character after the mismatch:
abcabcacab
abc?These special cases are indicated by
next[j] = −1.(If you're reading the paper, note that the authors use indexes starting at 1 as in Fortran, but the Python convention is to use indexes starting at 0, so that's what I'm giving here.)
This is the code that computes the next table. Please review.
def findPattern(pattern):
j = -1
next = [-1] * len(pattern)
i = 0 # next[0] is always -1, by KMP definition
while (i+1 < len(pattern)):
if (j == -1) or (pattern[j] == pattern[i]):
i += 1
j += 1
if pattern[i] != pattern[j]:
next[i] = j
else:
next[i] = next[j]
else:
j = next[j]
return next
if __name__ == "__main__":
print findPattern("aaaab")
print findPattern("abaabc")Output:
```
[-1, -1, -1, -
Solution
- Review
-
There's no docstring.
-
There's no need for parentheses around conditions (Python is not C), so instead of:
while (i+1 < len(pattern)):you can write:
while i+1 < len(pattern):-
The loop
while i+1
-
The or operator has lower precedence than comparison operators, so instead of:
if (j == -1) or (pattern[j] == pattern[i]):
you can omit the parentheses:
if j == -1 or pattern[j] == pattern[i]:
-
When there's a choice about whether to test for equality or inequality, then I think it's usually clearer to test for equality, so I would write if pattern[i] == pattern[j] instead of if pattern[i] != pattern[j].
-
There's a small inefficiency in your code. If the test j == -1 or pattern[j] == pattern[i] passes then you set j = next[j] and go round the while loop again. But the condition on the while loop is a condition on i, which has not changed, so you waste the test. It is better to go straight to the test on j, like this:
m = len(pattern)
while i + 1 -1 and pattern[i] != pattern[j]:
j = next[j]
i += 1
j += 1
if pattern[i] == pattern[j]:
next[i] = next[j]
else:
next[i] = j
-
After making this change, i always increases by 1 on each iteration of the main loop, so we could use a for` loop instead to make this clear.- Revised code
def kmp_table(pattern):
"""Compute the "next" table corresponding to pattern, for use in the
Knuth-Morris-Pratt string search algorithm.
"""
m = len(pattern)
next = [-1] * m
j = -1
for i in range(1, m):
while j > -1 and pattern[i-1] != pattern[j]:
j = next[j]
j += 1
if pattern[i] != pattern[j]:
next[i] = j
else:
next[i] = next[j]
return nextCode Snippets
while (i+1 < len(pattern)):while i+1 < len(pattern):if (j == -1) or (pattern[j] == pattern[i]):if j == -1 or pattern[j] == pattern[i]:m = len(pattern)
while i + 1 < m
while j > -1 and pattern[i] != pattern[j]:
j = next[j]
i += 1
j += 1
if pattern[i] == pattern[j]:
next[i] = next[j]
else:
next[i] = jContext
StackExchange Code Review Q#107909, answer score: 7
Revisions (0)
No revisions yet.