patternpythonMinor
Finding sub-list
Viewed 0 times
listsubfinding
Problem
Pythonic way of expressing the simple problem:
Tell if the list
Tell if the list
needle is sublist of haystack#!/usr/bin/env python3
def sublist (haystack, needle):
def start ():
i = iter(needle)
return next(i), i
try:
n0, i = start()
for h in haystack:
if h == n0:
n0 = next(i)
else:
n0, i = start()
except StopIteration:
return True
return FalseSolution
- Bug
Here's a case where your function fails:
>>> sublist([1, 1, 2], [1, 2])
Falsethis is because in the
else: case you go back to the beginning of the needle, but you keep going forward in the haystack, possibly skipping over a match. In the test case, your function tries matching with the alignment shown below, which fails at the position marked X:X
haystack 1,1,2
needle 1,2Then it starts over from the beginning of the needle, but keeps going forward in the haystack, thus missing the match:
X
haystack 1,1,2
needle 1,2So after a mismatch you need to go backward an appropriate distance in the haystack before starting over again from the beginning of the needle.
- A better algorithm
It turns out to be better to start matching from the end of the needle. If this fails to match, we can skip forward several steps: possibly the whole length of the needle. For example, in this situation:
X
haystack 1,2,3,4,6,1,2,3,4,5
needle 1,2,3,4,5we can skip forward by the whole length of the needle (because
6 does not appear in the needle). The next alignment we need to try is this:O O O O O
haystack 1,2,3,4,6,1,2,3,4,5
needle 1,2,3,4,5However, we can't always skip forward the whole length of the needle. The distance we can skip depends on the item that fails to match. In this situation:
X
haystack 1,2,3,4,1,2,3,4,5
needle 1,2,3,4,5we should skip forward by 4, to bring the
1s into alignment.Making these ideas precise leads to the Boyer–Moore–Horspool algorithm:
def find(haystack, needle):
"""Return the index at which the sequence needle appears in the
sequence haystack, or -1 if it is not found, using the Boyer-
Moore-Horspool algorithm. The elements of needle and haystack must
be hashable.
>>> find([1, 1, 2], [1, 2])
1
"""
h = len(haystack)
n = len(needle)
skip = {needle[i]: n - i - 1 for i in range(n - 1)}
i = n - 1
while i < h:
for j in range(n):
if haystack[i - j] != needle[-j - 1]:
i += skip.get(haystack[i], n)
break
else:
return i - n + 1
return -1Code Snippets
>>> sublist([1, 1, 2], [1, 2])
FalseX
haystack 1,1,2
needle 1,2X
haystack 1,1,2
needle 1,2X
haystack 1,2,3,4,6,1,2,3,4,5
needle 1,2,3,4,5O O O O O
haystack 1,2,3,4,6,1,2,3,4,5
needle 1,2,3,4,5Context
StackExchange Code Review Q#19627, answer score: 5
Revisions (0)
No revisions yet.