patternpythonMinor
Minimum window substring
Viewed 0 times
windowminimumsubstring
Problem
The minimum window substring problem from leetcode.com asks:
Given a string S and a string T, find the minimum window in S which will contain all the characters in T in complexity \$O(n)\$.
For example,
S =
T =
Minimum window is
I post two versions of code. The differences between two versions of code is, the first one will move the start index when there is a match, the 2nd version will not move the start index when there is a move.
Wondering if logics are both correct? If both have the correct logic, which one do you think has better performance? I have did basic testing and post my test cases below. Using Python 2.7.
```
from collections import defaultdict
import sys
def minWindow(source, pattern):
targetCount = defaultdict(int)
currentCount = defaultdict(int)
for ch in pattern:
targetCount[ch] += 1
start = 0
validCount = 0
minSize = sys.maxint
for index in range(len(source)):
if source[index] not in targetCount:
continue
if currentCount[source[index]] targetCount[source[start]]:
currentCount[source[start]] -= 1
start += 1
else:
minSize = min(minSize, index-start+1)
currentCount[source[start]] -= 1
validCount -= 1
start += 1
return (minSize, start-1)
def minWindowv2(source, pattern):
targetCount = defaultdict(int)
currentCount = defaultdict(int)
for ch in pattern:
targetCount[ch] += 1
start = 0
validCount = 0
minSize = sys.maxint
for index in range(len(source)):
if source[index] not in targetCount:
continue
if currentCount[source[index]] targetCount[source[start]]:
currentCount[source[start]] -= 1
start += 1
else:
minSize = min(minSize, index-start+1)
break
return (minSize, start)
Given a string S and a string T, find the minimum window in S which will contain all the characters in T in complexity \$O(n)\$.
For example,
S =
"ADOBECODEBANC" T =
"ABC"Minimum window is
"BANC".I post two versions of code. The differences between two versions of code is, the first one will move the start index when there is a match, the 2nd version will not move the start index when there is a move.
Wondering if logics are both correct? If both have the correct logic, which one do you think has better performance? I have did basic testing and post my test cases below. Using Python 2.7.
```
from collections import defaultdict
import sys
def minWindow(source, pattern):
targetCount = defaultdict(int)
currentCount = defaultdict(int)
for ch in pattern:
targetCount[ch] += 1
start = 0
validCount = 0
minSize = sys.maxint
for index in range(len(source)):
if source[index] not in targetCount:
continue
if currentCount[source[index]] targetCount[source[start]]:
currentCount[source[start]] -= 1
start += 1
else:
minSize = min(minSize, index-start+1)
currentCount[source[start]] -= 1
validCount -= 1
start += 1
return (minSize, start-1)
def minWindowv2(source, pattern):
targetCount = defaultdict(int)
currentCount = defaultdict(int)
for ch in pattern:
targetCount[ch] += 1
start = 0
validCount = 0
minSize = sys.maxint
for index in range(len(source)):
if source[index] not in targetCount:
continue
if currentCount[source[index]] targetCount[source[start]]:
currentCount[source[start]] -= 1
start += 1
else:
minSize = min(minSize, index-start+1)
break
return (minSize, start)
Solution
- Review
I don't like comparative reviews, so I only looked in detail at the first version of your code. Possibly some of these comments apply to the other version too.
-
The code does not return the correct answer if there are repeated letters in
pattern. For example, given S = 'ABCA' and T = 'AA', the minimum window should be the whole string. But:>>> minWindow('ABCA', 'AA')
(1, 3)To fix this, use
len(pattern) instead of len(targetCount).-
It would be better to raise an exception if no window is found rather than returning the meaningless result
(4294967295, -1).-
sys.maxint is not portable to Python 3. If you need a starting value that's bigger than any integer, use float('inf'). But it's possible to reorganize the code so that there's no need for a starting value — see the revised code below.-
Instead of
defaultdict(int), use Counter, which you can initialize directly from an iterable:targetCount = Counter(pattern)-
Instead of iterating over indexes into
source, use enumerate:for end, c in enumerate(source):
if c not in targetCount:
continue
if currentCount[c] < targetCount[c]:
validCount += 1
currentCount[c] += 1
# ... and so on ...This avoids lots of lookups of
source[index].-
There's repetition of lines like
start += 1 and currentCount[source[start]] -= 1. It would be easy to reorder the code so that each line appears only once.-
Similarly, instead of maintaining and incrementing a
start index, create an iterable that enumerates the source string:window_start = enumerate(source)and call
next each time you need another character from it:start, c = next(window_start)
if c not in targetCount:
continue
# ... and so on ...This avoids lots of lookups of
source[start].-
The inner loop is:
while validCount == len(targetCount):but
validCount does not change on every loop iteration (only on loop iterations where the start character causes one of the current counts to drop below its target). So some of these tests are wasted. Instead, you could loop using:for start, c in window_start:and then exit the loop only after reducing
validCount:validCount -= 1
if validCount < len(pattern):
break-
The test:
if source[index] not in targetCount:
continueis not necessary — the algorithm works just as well without it. And similarly for the corresponding test in the inner loop.
- Revised code
In this implementation I've tried to emphasize the symmetry between the outer and the inner loop. Really what we have here is a pair of coroutines, each iterating over the input, and yielding to the other one at the appropriate point.
from collections import Counter
def min_window(source, pattern):
"""Return the shortest substring of source that contains all the
characters in pattern (with the required multiplicity). Raise
ValueError if there is no such substring.
"""
def windows():
pattern_length = len(pattern)
target = Counter(pattern) # character -> required count
current = Counter() # character -> count in current window
matched = 0 # matched characters in current window
window_start, window_end = enumerate(source), enumerate(source)
for end, c in window_end:
current[c] += 1
if current[c] <= target[c]:
matched += 1
if matched == pattern_length:
for start, c in window_start:
current[c] -= 1
if current[c] < target[c]:
matched -= 1
if matched < pattern_length:
yield end + 1 - start, start
break
length, start = min(windows())
return source[start: start + length]Code Snippets
>>> minWindow('ABCA', 'AA')
(1, 3)targetCount = Counter(pattern)for end, c in enumerate(source):
if c not in targetCount:
continue
if currentCount[c] < targetCount[c]:
validCount += 1
currentCount[c] += 1
# ... and so on ...window_start = enumerate(source)start, c = next(window_start)
if c not in targetCount:
continue
# ... and so on ...Context
StackExchange Code Review Q#132227, answer score: 4
Revisions (0)
No revisions yet.