patternpythonMinor
Comparing different string-matching functions
Viewed 0 times
comparingdifferentfunctionsstringmatching
Problem
Here is a problem came from codingbat:
Given 2 strings, a and b, return the number of the positions where
they contain the same length 2 substring. So "xxcaazz" and "xxbaaz"
yields 3, since the "xx", "aa", and "az" substrings appear in the
same place in both strings.
There are several answers but it may hard to choose which one is the most preferred, such as:
Note that the "preferable" might be subjective; it may refer to the speed, the memory use or the coding style. For instance, their speeds are reported as:
According to the results, there may no difference between these approaches. For memory use, similar results can be shown by
Given 2 strings, a and b, return the number of the positions where
they contain the same length 2 substring. So "xxcaazz" and "xxbaaz"
yields 3, since the "xx", "aa", and "az" substrings appear in the
same place in both strings.
There are several answers but it may hard to choose which one is the most preferred, such as:
# Solution 1
# Using for loop
def strmatch_forloop(a, b):
shorter = min(len(a), len(b))
count = 0
for i in range(shorter-1):
a_sub = a[i:i+2]
b_sub = b[i:i+2]
if a_sub == b_sub:
count = count + 1
return count# Solution 2
# Using list comprehension
def strmatch_listcomp(a, b):
shorter = min(len(a), len(b))
return [a[i:i+2] == b[i:i+2] for i in range(shorter-1)].count(True)# Solution 3
# Using generator
def strmatch_gen(a, b):
shorter = min(len(a), len(b))
return sum(a[i:i+2] == b[i:i+2] for i in range(shorter-1))Note that the "preferable" might be subjective; it may refer to the speed, the memory use or the coding style. For instance, their speeds are reported as:
%timeit strmatch_forloop
10000000 loops, best of 3: 21.7 ns per loop
%timeit strmatch_listcomp
10000000 loops, best of 3: 22.9 ns per loop
%timeit strmatch_gen
10000000 loops, best of 3: 21.8 ns per loopAccording to the results, there may no difference between these approaches. For memory use, similar results can be shown by
%memit. However, coding style is too subjective to measure. How could I choose among them?Solution
Since memory and performance metrics are similar, coding style is possibly the only differentiating factor.
Nothing wrong with solution 1 but it takes more time to read it, unless you are paid by the number of lines you write! I guess it is Pythonic to write code in fewer lines so long as readability is not compromised.
I prefer solution 3. It reads well and terse. Solution 2 is just as good but it appears to be doing extra work in the sense of creating a list and then counting. Why do that when you can directly sum up the matches?
Nothing wrong with solution 1 but it takes more time to read it, unless you are paid by the number of lines you write! I guess it is Pythonic to write code in fewer lines so long as readability is not compromised.
I prefer solution 3. It reads well and terse. Solution 2 is just as good but it appears to be doing extra work in the sense of creating a list and then counting. Why do that when you can directly sum up the matches?
Context
StackExchange Code Review Q#62492, answer score: 3
Revisions (0)
No revisions yet.