snippetpythonMinor
Optimise runtime of filter character method
Viewed 0 times
methodruntimecharacteroptimisefilter
Problem
I'm trying to run the method below against a very large corpus and I need to optimize/reduce the run time for this method as it already takes up about 6 seconds to execute.
Requirements:
Output will be test'
The code above will take about 5 seconds to run.
If I change lines 2-7 of the code to this line:
The runtime somehow increases by 1 more second.
Does anyone here have a better idea or an improved optimized way to check for this at a much faster speed?
Requirements:
- Check the word only consist of alphabets, hyphen and apostrophe
- First character of word must be alphabet
- Last character of word must be alphabet or apostrophe only
- Use of re library (regex) strictly not allowed
def delUnknownChar(w):
wf = []
for c in w:
if (c == "'" or c == "-" or c.isalpha()):
wf.append(c)
w = "".join(wf)
wf.clear()
if (len(w) > 1):
while(not w[0].isalpha()):
w = w[1:]
while (w[-1] == "-"):
w = w[:-1]
return w
else:
return None
string1 = delUnknownChar("-'test'-")
print(string1)Output will be test'
The code above will take about 5 seconds to run.
If I change lines 2-7 of the code to this line:
w = "".join(c for c in w if c == "'" or c == "-" or c.isalpha())The runtime somehow increases by 1 more second.
Does anyone here have a better idea or an improved optimized way to check for this at a much faster speed?
Solution
Possible performance improvements
-
-
You probably have more characters in a string corresponding to
-
You could create a table containing all the values of
Style
A few notes about style:
-
Please, don't put parenthesis around the
-
Same goes for
-
w = w[:-1] is probably inefficient since it asks to perform a copy of every element of the list minus one. del w[-1] is the idiomatic way to delete the last element of a list.-
You probably have more characters in a string corresponding to
isalpha than characters exactly equal to "'" or "-". Therefore, you may notice speed improvements by checking isalpha before the other two conditions.-
You could create a table containing all the values of
isalpha plus "'" and "*" and then check whether a character is in this table. However, I don't think that it can bring you significant speed improvements, unless you filter with a dedicated str method that may be optimized (replace for example).Style
A few notes about style:
-
Please, don't put parenthesis around the
while condition, that goes against the Python style guide (PEP8).-
Same goes for
if.Context
StackExchange Code Review Q#69184, answer score: 2
Revisions (0)
No revisions yet.