HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Filename generator with Underscore

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
withfilenameunderscoregenerator

Problem

The code generates all combinations of filename with underscore _:

def filename_generator(name, path=''):
    names = []

    for i in range(len(name)):
        p = '_'.join([x for x in name[i:]]) + path

        names.append(name[:i] + p)
        names += filename_generator(name[:i], p)

    return sorted(list(set(names)))

for e in filename_generator('longlonglonglonglonglongname'):
    print(e)


Example:

Input:

name


Output:

n_a_m_e
n_a_me
n_am_e
n_ame
na_m_e
na_me
nam_e
name


My code works well, but very slowly. How can I optimize the algorithm?

Solution

Note that there are 2n-1 different such names for a string of lengh n, as there can or can not be a _ between any two characters. For your long-long name that makes 134,217,728 variants!

Regardless of what algorithm you use, this will take a very long time. One way to work around this, depending on your actual use case, might be to use a generator function. This way, you do not need to calculate all the variants, if maybe you need just a few of them (maybe just enough to find one that does not yet exist?). For example, like this:

def filename_generator(name):
    if len(name) > 1:
        first, rest = name[0], name[1:]
        for rest2 in filename_generator(rest):
            yield first + rest2
            yield first + "_" + rest2
    else:
        yield name


Output:

>>> list(filename_generator("name"))
['name', 'n_ame', 'na_me', 'n_a_me', 'nam_e', 'n_am_e', 'na_m_e', 'n_a_m_e']


Or, building upon the idea by @jacdeh, using binary numbers to determine where to put a _.

def filename_generator(name):
    d = {"0": "", "1": "_"}
    for n in range(2**(len(name)-1)):
        binary = "{:0{}b}".format(n, len(name))
        yield''.join(d[s] + c for c, s in zip(name, binary))


Or similar, using bit shifting:

def filename_generator(name):
    for n in range(2**(len(name)-1)):
        yield ''.join(c + ("_" if n & 1<<i == 1<<i else "") for i, c in enumerate(name))


However, either both those implementation are pretty lousy, or it does not help much in the end: They are both about 10 times slower than the recursive function, according to timeit.

Here's some timing information, using IPython's %timeit magic method:

In [8]: %timeit filename_generator("longname")         # your original function
1000 loops, best of 3: 610 µs per loop
In [9]: %timeit list(filename_generator1("longname"))  # my recursive generator
10000 loops, best of 3: 22.5 µs per loop
In [10]: %timeit list(filename_generator2("longname")) # my binary generator
1000 loops, best of 3: 322 µs per loop
In [11]: %timeit partition_string("longname")          # Eric's binary function
1000 loops, best of 3: 200 µs per loop


Another possibility, as posted in comments, would be to use itertools.product to generate the "mask", similar to the binary number solution, but with less involved math.

from itertools import product, chain
def filename_generator4(name):
    for mask in product(("_", ""), repeat=len(name) - 1):
        yield ''.join(chain(*zip(name, mask + ("",))))


However, performance is about the same as for the binary solutions. I guess the bottleneck is the number of string concatenations: In the recursive solution, you "reuse" the partial solutions, while each of the latter solutions construct each solution "from scratch", so there are fewer total string concatenation in the recursive case. Not sure, though, so feel free to comment.

Code Snippets

def filename_generator(name):
    if len(name) > 1:
        first, rest = name[0], name[1:]
        for rest2 in filename_generator(rest):
            yield first + rest2
            yield first + "_" + rest2
    else:
        yield name
>>> list(filename_generator("name"))
['name', 'n_ame', 'na_me', 'n_a_me', 'nam_e', 'n_am_e', 'na_m_e', 'n_a_m_e']
def filename_generator(name):
    d = {"0": "", "1": "_"}
    for n in range(2**(len(name)-1)):
        binary = "{:0{}b}".format(n, len(name))
        yield''.join(d[s] + c for c, s in zip(name, binary))
def filename_generator(name):
    for n in range(2**(len(name)-1)):
        yield ''.join(c + ("_" if n & 1<<i == 1<<i else "") for i, c in enumerate(name))
In [8]: %timeit filename_generator("longname")         # your original function
1000 loops, best of 3: 610 µs per loop
In [9]: %timeit list(filename_generator1("longname"))  # my recursive generator
10000 loops, best of 3: 22.5 µs per loop
In [10]: %timeit list(filename_generator2("longname")) # my binary generator
1000 loops, best of 3: 322 µs per loop
In [11]: %timeit partition_string("longname")          # Eric's binary function
1000 loops, best of 3: 200 µs per loop

Context

StackExchange Code Review Q#95442, answer score: 8

Revisions (0)

No revisions yet.