patternpythonMinor
Filename generator with Underscore
Viewed 0 times
withfilenameunderscoregenerator
Problem
The code generates all combinations of filename with underscore
Example:
Input:
Output:
My code works well, but very slowly. How can I optimize the algorithm?
_:def filename_generator(name, path=''):
names = []
for i in range(len(name)):
p = '_'.join([x for x in name[i:]]) + path
names.append(name[:i] + p)
names += filename_generator(name[:i], p)
return sorted(list(set(names)))
for e in filename_generator('longlonglonglonglonglongname'):
print(e)Example:
Input:
nameOutput:
n_a_m_e
n_a_me
n_am_e
n_ame
na_m_e
na_me
nam_e
nameMy code works well, but very slowly. How can I optimize the algorithm?
Solution
Note that there are 2n-1 different such names for a string of lengh n, as there can or can not be a
Regardless of what algorithm you use, this will take a very long time. One way to work around this, depending on your actual use case, might be to use a generator function. This way, you do not need to calculate all the variants, if maybe you need just a few of them (maybe just enough to find one that does not yet exist?). For example, like this:
Output:
Or, building upon the idea by @jacdeh, using binary numbers to determine where to put a
Or similar, using bit shifting:
However, either both those implementation are pretty lousy, or it does not help much in the end: They are both about 10 times slower than the recursive function, according to
Here's some timing information, using IPython's
Another possibility, as posted in comments, would be to use
However, performance is about the same as for the binary solutions. I guess the bottleneck is the number of string concatenations: In the recursive solution, you "reuse" the partial solutions, while each of the latter solutions construct each solution "from scratch", so there are fewer total string concatenation in the recursive case. Not sure, though, so feel free to comment.
_ between any two characters. For your long-long name that makes 134,217,728 variants!Regardless of what algorithm you use, this will take a very long time. One way to work around this, depending on your actual use case, might be to use a generator function. This way, you do not need to calculate all the variants, if maybe you need just a few of them (maybe just enough to find one that does not yet exist?). For example, like this:
def filename_generator(name):
if len(name) > 1:
first, rest = name[0], name[1:]
for rest2 in filename_generator(rest):
yield first + rest2
yield first + "_" + rest2
else:
yield nameOutput:
>>> list(filename_generator("name"))
['name', 'n_ame', 'na_me', 'n_a_me', 'nam_e', 'n_am_e', 'na_m_e', 'n_a_m_e']Or, building upon the idea by @jacdeh, using binary numbers to determine where to put a
_.def filename_generator(name):
d = {"0": "", "1": "_"}
for n in range(2**(len(name)-1)):
binary = "{:0{}b}".format(n, len(name))
yield''.join(d[s] + c for c, s in zip(name, binary))Or similar, using bit shifting:
def filename_generator(name):
for n in range(2**(len(name)-1)):
yield ''.join(c + ("_" if n & 1<<i == 1<<i else "") for i, c in enumerate(name))However, either both those implementation are pretty lousy, or it does not help much in the end: They are both about 10 times slower than the recursive function, according to
timeit.Here's some timing information, using IPython's
%timeit magic method:In [8]: %timeit filename_generator("longname") # your original function
1000 loops, best of 3: 610 µs per loop
In [9]: %timeit list(filename_generator1("longname")) # my recursive generator
10000 loops, best of 3: 22.5 µs per loop
In [10]: %timeit list(filename_generator2("longname")) # my binary generator
1000 loops, best of 3: 322 µs per loop
In [11]: %timeit partition_string("longname") # Eric's binary function
1000 loops, best of 3: 200 µs per loopAnother possibility, as posted in comments, would be to use
itertools.product to generate the "mask", similar to the binary number solution, but with less involved math.from itertools import product, chain
def filename_generator4(name):
for mask in product(("_", ""), repeat=len(name) - 1):
yield ''.join(chain(*zip(name, mask + ("",))))However, performance is about the same as for the binary solutions. I guess the bottleneck is the number of string concatenations: In the recursive solution, you "reuse" the partial solutions, while each of the latter solutions construct each solution "from scratch", so there are fewer total string concatenation in the recursive case. Not sure, though, so feel free to comment.
Code Snippets
def filename_generator(name):
if len(name) > 1:
first, rest = name[0], name[1:]
for rest2 in filename_generator(rest):
yield first + rest2
yield first + "_" + rest2
else:
yield name>>> list(filename_generator("name"))
['name', 'n_ame', 'na_me', 'n_a_me', 'nam_e', 'n_am_e', 'na_m_e', 'n_a_m_e']def filename_generator(name):
d = {"0": "", "1": "_"}
for n in range(2**(len(name)-1)):
binary = "{:0{}b}".format(n, len(name))
yield''.join(d[s] + c for c, s in zip(name, binary))def filename_generator(name):
for n in range(2**(len(name)-1)):
yield ''.join(c + ("_" if n & 1<<i == 1<<i else "") for i, c in enumerate(name))In [8]: %timeit filename_generator("longname") # your original function
1000 loops, best of 3: 610 µs per loop
In [9]: %timeit list(filename_generator1("longname")) # my recursive generator
10000 loops, best of 3: 22.5 µs per loop
In [10]: %timeit list(filename_generator2("longname")) # my binary generator
1000 loops, best of 3: 322 µs per loop
In [11]: %timeit partition_string("longname") # Eric's binary function
1000 loops, best of 3: 200 µs per loopContext
StackExchange Code Review Q#95442, answer score: 8
Revisions (0)
No revisions yet.