HiveBrain v1.2.0
Get Started
← Back to all entries
snippetpythonMinor

Optimise runtime of filter character method

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
methodruntimecharacteroptimisefilter

Problem

I'm trying to run the method below against a very large corpus and I need to optimize/reduce the run time for this method as it already takes up about 6 seconds to execute.

Requirements:

  • Check the word only consist of alphabets, hyphen and apostrophe



  • First character of word must be alphabet



  • Last character of word must be alphabet or apostrophe only



  • Use of re library (regex) strictly not allowed



def delUnknownChar(w):
    wf = []
    for c in w:
        if (c == "'" or c == "-" or c.isalpha()):
            wf.append(c)

    w = "".join(wf)
    wf.clear()

    if (len(w) > 1):
        while(not w[0].isalpha()):
            w = w[1:]

        while (w[-1] == "-"):
            w = w[:-1]

        return w
    else:
        return None

string1 = delUnknownChar("-'test'-")
print(string1)


Output will be test'
The code above will take about 5 seconds to run.

If I change lines 2-7 of the code to this line:

w = "".join(c for c in w if c == "'" or c == "-" or c.isalpha())


The runtime somehow increases by 1 more second.

Does anyone here have a better idea or an improved optimized way to check for this at a much faster speed?

Solution

Possible performance improvements

-
w = w[:-1] is probably inefficient since it asks to perform a copy of every element of the list minus one. del w[-1] is the idiomatic way to delete the last element of a list.

-
You probably have more characters in a string corresponding to isalpha than characters exactly equal to "'" or "-". Therefore, you may notice speed improvements by checking isalpha before the other two conditions.

-
You could create a table containing all the values of isalpha plus "'" and "*" and then check whether a character is in this table. However, I don't think that it can bring you significant speed improvements, unless you filter with a dedicated str method that may be optimized (replace for example).

Style

A few notes about style:

-
Please, don't put parenthesis around the while condition, that goes against the Python style guide (PEP8).

-
Same goes for if.

Context

StackExchange Code Review Q#69184, answer score: 2

Revisions (0)

No revisions yet.