patternpythonMinor

Longest Common Substring solution for a string and a list

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

solutionsubstringforandlistlongeststringcommon

Problem

Recently I have discovered one thing, which makes me feel frustrated. The Perl is lack of syntax sugar for strings, to do the same things, those are applicable to lists. The implementation I posted here would require to be rewritten to work for lists.

Thus, I had to re-write the code to Python to work for list as efficient as it does for strings:

def _contains(subseq, inseq):
    return any(inseq[pos:pos + len(subseq)] == subseq for pos in range(0, len(inseq) - len(subseq) + 1))

def lcss(s1, s2, threshold=3):

    all_of = []

    def is_inner_seq(i, j):
        return any(k < i < j <= l or k <= i < j < l for k, l in all_of)

    for i in range(0, len(s1)):
        for j in range(i + threshold, len(s1) + 1):

            if _contains(subseq=s1[i:j], inseq=s2):
                all_of.append((i, j))  # add a tuple to work OK when s1, s2 are lists

    only_outer_seq = filter(lambda tup: not is_inner_seq(tup[0], tup[1]),
                            all_of)

    return [s1[i:j] for i, j in only_outer_seq]

Here I implemented one minor improvement - filtering out inner substrings from the resultset, which aims to display more laconic result.

To not to frighten anyone with long sheets of code, here I am attaching a link to units tests, I used to tune up this solution: https://gist.github.com/anonymous/aa7540cc7db2ca446e2b

Your comments are welcomed! Especially on function '_contains', which does not look like a simple way of doing a simple task. I have started studying the Python just a few weeks ago.

Solution

-
No docstrings! What do your functions do and how do I call them?

-
Your lcss function has a name that suggests it might return the longest common subsequence. But in fact it returns all common subsequences of length 3 or more (except for ones that appear inside other common subsequences in s1). Is this really what you want the function to return? If so, you probably want to rethink the name.

-
Python's in operator already implements your _contains operation for strings:

>>> [w.strip() for w in open('/usr/share/dict/words') if 'abc' in w]
['crabcatcher', 'dabchick']

It's also much faster.

-
I guess you didn't use in because you want your subsequence operation to work on all kinds of sequence (not just strings). But in that case you should prefer to use one of the Boyer–Moore search algorithms, for example Boyer–Moore–Horspool, so that the complexity is more like \$O(n)\$ (on average on random inputs) than \$O(mn)\$.

See this answer for a Boyer–Moore–Horspool implementation in Python.

-
Your algorithm has complexity \$O(m^3n)\$ (where \$m\$ and \$n\$ are the lengths of the two sequences). But the longest common subsequence problem for two sequences has a well known \$O(mn)\$ algorithm, described in detail in the Wikipedia article.

Code Snippets

>>> [w.strip() for w in open('/usr/share/dict/words') if 'abc' in w]
['crabcatcher', 'dabchick']

Context

StackExchange Code Review Q#56464, answer score: 4

Revisions (0)

No revisions yet.