HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Optimising an iterative function over long strings

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
iterativelongfunctionoptimisingstringsover

Problem

I'm doing a coding challenge for fun and to work on my skills - some of you may remember the Advent of Code challenge from last December, I'm working through that. I've got this code as the solution to one of the problems, which works, but it's uselessly slow.

inp = "1113122113"

def iterate(num):
    pos = 0
    new = ""
    while pos < len(num):
        counter = 0
        d = num[pos]
        for i in range(pos, len(num)):
            if num[i] == d:
                counter += 1
            else:
                break
        new+=str(counter)
        new+=str(d)
        pos += counter
    print len(new)
    return new

for i in range (50):
    inp = iterate(inp)


Past iteration 35 or so, it gets to the point where it's taking several minutes for each iteration. The objective is to generate a look-and-say sequence - so 1 goes to 11, then 21, 1211, 111221, 3112211 and so on. I need to find the length of the string after the 50th iteration, but it's 360154 characters long after 40 iterations and my code just isn't optimised enough to handle strings that long in a reasonable time. Can anyone give me some pointers?

Solution

So let's look at what the basic idea of the sequence is (if I understand correctly):

  • Break the sequence into contiguous blocks of identical values.



  • Figure out the lengths of those blocks.



  • Create a new sequence with the length of each block followed by the value of that block.



Step 1 seems to be the problem here. The key issue is to me is that you are rolling your own solution when python provides something to do step 1 for you: itertools.groupby.

So consider your last number:

>>> from itertools import groupby
>>> inp = "1113122113"
>>> print([(k, list(g)) for k, g in groupby(inp)])
[('1', ['1', '1', '1']), ('3', ['3']), ('1', ['1']), ('2', ['2', '2']), ('1', ['1', '1']), ('3', ['3'])]


This gives you an iterator of tuples, where the first tuple is the value of the group and the second is the group itself. You can use the value of the group for second element of each number pair and length of the group as the first element. Then you just have to put them together into a new string.

So the algorithm can be reduced to:

new = ''.join(str(sum(1 for _ in g))+k for k, g in groupby(inp))


Other comments:

  • Your multi-run is done in the root of the script. It would be better to give it its own function.



  • You are printing in the same function that generates the sequence. This assumes you will always want to print. Better to just return the results and let whatever calls the function determine what to do with those results.



  • You should put all the automatically-run code in if __name__ = '__main__': so you can use the functions elsewhere.



  • You should use from __future__ import print_function to make your code python3 compatible.



  • Use _ in place of i for loop indices you don't care about.



  • You can use default function arguments to make the code a bit simpler to run



So here is my complete solution:

from __future__ import print_function

def iterate(num):
    return ''.join(str(sum(1 for _ in g))+k for k, g in groupby(num))

def multiiter(n, inp='1'):
    for _ in range(n):
        print(inp, len(inp))
        inp = iterate(inp)

if __name__ == '__main__':
    multiiter(n=50, inp='1113122113')


Using a version that commented out the print took 5.53 seconds to run 50 answers.

Edit: Included Copperfield's sum suggestion.

Code Snippets

>>> from itertools import groupby
>>> inp = "1113122113"
>>> print([(k, list(g)) for k, g in groupby(inp)])
[('1', ['1', '1', '1']), ('3', ['3']), ('1', ['1']), ('2', ['2', '2']), ('1', ['1', '1']), ('3', ['3'])]
new = ''.join(str(sum(1 for _ in g))+k for k, g in groupby(inp))
from __future__ import print_function


def iterate(num):
    return ''.join(str(sum(1 for _ in g))+k for k, g in groupby(num))


def multiiter(n, inp='1'):
    for _ in range(n):
        print(inp, len(inp))
        inp = iterate(inp)


if __name__ == '__main__':
    multiiter(n=50, inp='1113122113')

Context

StackExchange Code Review Q#143433, answer score: 5

Revisions (0)

No revisions yet.