HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Determine whether a list contains a particular sequence

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
sequencecontainsdetermineparticularlistwhether

Problem

Allowing wraparound, the sequence [2, 3, 5, 7, 11] contains the sequence [7, 11, 2].

I'm looking for compact/pythonic way to code this, rather than an optimally-efficient way. (Unless I need the efficiency, I usually gravitate to the path of least ASCII).

An obvious method would be as follows (I provide three different ways to code it):

def Contains(L, s):
    L_extended = L+L[0:len(s)]

    for i in xrange( len(L) ):
        if L_extended[i:i+len(s)] == s:
            return True

    return False

def Contains2(L, s):
    L_extended = L+L[0:len(s)]

    return next( \
        ( True for i in xrange(len(L))  if  L_extended[i:i+len(s)] == s ), \
        False \
        )

def Contains3(L, s):
    L_extended = L+L[0:len(s)]

    try:
        next( i  for  i in xrange(len(L))  if  L_extended[i:i+len(s)] == s )
    except StopIteration:
        return False

    return True

print Contains3( [1,2,3,4,5], [1,3] )   # False
print Contains3( [1,2,3,4,5], [2,3] )   # True
print Contains3( [1,2,3,4,5], [4,5,1] ) # True


My preference is for the first, but Python is full of pleasant surprises -- can anyone see a cleaner way to do it?

Solution

As you said in your comments above, the first is the cleanest way you came up with.

Instead of writing your own logic, Python has a built-in library difflib that contains the SequenceMatcher class. You can use this to find the longest match between two iterables:

from difflib import SequenceMatcher

def contains(main_list, sub_list, junk=None):
    sub_len = len(sub_list)
    extended = main_list + main_list[:sub_len]

    sequence = SequenceMatcher(junk, extended, sub_list)
    return sequence.get_longest_match(0, len(extended), 0, sub_len).size == sub_len


My only qualm with get_longest_match is that its parameters are ALL positional arguments and are thus required. I would prefer the API assume we want the entire length of both iterables searched. Instead of using get_longest_match you could instead use the get_matching_blocks function (which returns tuples) then iterate through those until the corresponding tuple value is equal to sub_len.

By using SequenceMatcher you also get the flexibility to declare what 'junk' you want the class to ignore. So say you want to find the longest match between the two lists, but you don't care about 1 or 2. You would simply pass the lambda:

lambda num: num in [1, 2]


as the junk parameter.

Now that I've given my structure recommendation, I want to give some style pointers.

First of all, take a look at PEP8, the official Python style guide. It gives a great reference to how your code should look. Here are some places your code strays from the Python conventions:

  • Use descriptive variable names. Single letter variable names are typically frowned-upon. When naming, err on the side of being too descriptive instead of too brief.



-
Use underscores_in_names for basically everything in Python. The only aberration from this rule is class names which are written in PascalCase.

Also, variable and function/method names should be lowercase. Conventionally, uppercase names are saved for constants. So by convention, you L variable is considered constant.

  • Your spacing is quite nice; except for a few occasions where you place a single space after an opening paren and before the close paren. PEP8 actually labels that style as a pet peeve.



-
Multi-line statement happen. PEP8 purposefully takes a neutral stance on how those should be formatted. However, Python is quite lenient in terms of newlines in statements, so more than likely, there is a better way to format your multi-line statements without escaping the newlines.

def Contains2(L, s):
    L_extended = L+L[0:len(s)]

    return next((True for i in xrange(len(L)) 
                 if  L_extended[i:i+len(s)] == s), False)

Code Snippets

from difflib import SequenceMatcher

def contains(main_list, sub_list, junk=None):
    sub_len = len(sub_list)
    extended = main_list + main_list[:sub_len]

    sequence = SequenceMatcher(junk, extended, sub_list)
    return sequence.get_longest_match(0, len(extended), 0, sub_len).size == sub_len
lambda num: num in [1, 2]
def Contains2(L, s):
    L_extended = L+L[0:len(s)]

    return next((True for i in xrange(len(L)) 
                 if  L_extended[i:i+len(s)] == s), False)

Context

StackExchange Code Review Q#52470, answer score: 3

Revisions (0)

No revisions yet.