HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Naive implementation of KMP algorithm

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
implementationkmpnaivealgorithm

Problem

After reading this answer to the question "High execution time to count overlapping substrings", I decided to implement the suggested Knuth-Morris-Pratt (KMP) algorithm. I used the pseudo-code listed on Wikipedia for the functions kmp_table and kmp_search.

However, when running it on some corner-cases, I have observed that it is a lot slower than the standard str.find, which apparently uses a modified Boyer-Moore-Horspool algorithm and should thus have worse worst-case performance.

The specific case I looked at is:

$ ipython -i kmp.py
In [1]: text = "A"*1000000 + "B"
In [2]: word = "A"*100 + "B"
In [3]: %timeit kmp_search(text, word)
1 loop, best of 3: 410 ms per loop
In [4}: %timeit text.find(word)
1000 loops, best of 3: 703 µs per loop


So the difference is about a factor 1000 for this input. This is probably due to the fact that the native one is written in C and this is written in Python, but I still wanted to see if I did anything stupid here or missed any obvious optimization.

def kmp_table(word):
    table = [0] * len(word)
    position, candidate = 2, 0
    table[0] = -1

    while position  0:
            candidate = table[candidate]
        else:
            table[position] = 0
            position += 1
    return table

def kmp_search(text, word):
    m, i = 0, 0
    table = kmp_table(word)
    while m + i  -1:
                m += i - table[i]
                i = table[i]
            else:
                m += 1
                i = 0
    return len(text)

Solution

Minor comments, but...

else:
        if table[i] > -1:
            m += i - table[i]
            i = table[i]
        else:
            m += 1
            i = 0


This sort of construct, an else which contains only an if-else chain, can be simply written as an elif-else chain.

elif table[i] > -1:
        m += i - table[i]
        i = table[i]
    else:
        m += 1
        i = 0


table[position] = candidate + 1
        candidate += 1


These statements seem weird, why not first add one and then set?

candidate += 1
        table[position] = candidate

Code Snippets

else:
        if table[i] > -1:
            m += i - table[i]
            i = table[i]
        else:
            m += 1
            i = 0
elif table[i] > -1:
        m += i - table[i]
        i = table[i]
    else:
        m += 1
        i = 0
table[position] = candidate + 1
        candidate += 1
candidate += 1
        table[position] = candidate

Context

StackExchange Code Review Q#158396, answer score: 3

Revisions (0)

No revisions yet.