patternpythonMinor
Searching text for a string
Viewed 0 times
textstringforsearching
Problem
This is my current code:
How would I optimize this without using regular expressions?
def search(word,source):
places = []
for x in range(0,len(source)):
if source[x] == word[0]:
if source[x:x+len(word)] == word:
places.append(x)
return placesHow would I optimize this without using regular expressions?
Solution
Using
Note the following:
With those in mind:
I have also added a docstring to explain what the function does.
However, if you really want efficiency (and readability, for that matter), ignoring regular expressions is a bad move:
timeit, we can make changes and see how it affects the speed. Baseline:>>> def search1(word,source):
places = []
for x in range(0,len(source)):
if source[x] == word[0]:
if source[x:x+len(word)] == word:
places.append(x)
return places
>>> timeit.timeit("search1(word, source)",
setup="from __main__ import search1, search2, search3; import re; word, source = 'foo', 'foobarbaz'*1000",
number=10000)
15.60436644058882Note the following:
- You calculate
word[0]andlen(word)on every loop, these can be factored out;
- A list comprehension is generally faster than
forlooping andappending; and
enumerateallows you to get the index and character fromsourcesimultaneously.
With those in mind:
>>> def search2(word, source):
"""Find every index in source at which word starts."""
firstchar = word[0]
wordlen = len(word)
return [index for index, char in enumerate(source)
if firstchar == char and word == source[index:index+wordlen]]
>>> timeit.timeit("search2(word, source)",
setup="from __main__ import search1, search2, search3; import re; word, source = 'foo', 'foobarbaz'*1000",
number=10000)
9.99040215947457I have also added a docstring to explain what the function does.
However, if you really want efficiency (and readability, for that matter), ignoring regular expressions is a bad move:
>>> def search3(word, source):
"""Find every index in source at which word starts."""
return [m.start() for m in re.finditer(word, source)]
>>> timeit.timeit("search3(word, source)",
setup="from __main__ import search1, search2, search3; import re; word, source = 'foo', 'foobarbaz'*1000",
number=10000)
2.1652778798459167Code Snippets
>>> def search1(word,source):
places = []
for x in range(0,len(source)):
if source[x] == word[0]:
if source[x:x+len(word)] == word:
places.append(x)
return places
>>> timeit.timeit("search1(word, source)",
setup="from __main__ import search1, search2, search3; import re; word, source = 'foo', 'foobarbaz'*1000",
number=10000)
15.60436644058882>>> def search2(word, source):
"""Find every index in source at which word starts."""
firstchar = word[0]
wordlen = len(word)
return [index for index, char in enumerate(source)
if firstchar == char and word == source[index:index+wordlen]]
>>> timeit.timeit("search2(word, source)",
setup="from __main__ import search1, search2, search3; import re; word, source = 'foo', 'foobarbaz'*1000",
number=10000)
9.99040215947457>>> def search3(word, source):
"""Find every index in source at which word starts."""
return [m.start() for m in re.finditer(word, source)]
>>> timeit.timeit("search3(word, source)",
setup="from __main__ import search1, search2, search3; import re; word, source = 'foo', 'foobarbaz'*1000",
number=10000)
2.1652778798459167Context
StackExchange Code Review Q#70720, answer score: 5
Revisions (0)
No revisions yet.