HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonModerate

Optimizing Project Euler Solution #12

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
projectoptimizingeulersolution

Problem

Just to reiterate my prev. post, as I think it helps structure my weaknesses.

-
Coding methodology: Is the algorithm I used feasible or mostly correct? If not, how do I correct it and why won't it work or scale? I realize in some cases there are always going to be optimal solutions that will be over my head mathematically, but generally you can yield a solution that is within a reasonable range of optimal that is passable without knowing a ton of number theory in my experience. Where possible, I hope to learn algorithms along the way (such as the Sieve of Eratosthenes for prime ranges).

-
Coding execution: Given some pseudocode that we determine algorithmically will work, is the code written executed without redundancies and minimizing runtime?

-
Being more pythonic: Is something I'm not doing pythonic or can it be done simpler with some sort of library I don't know about? I believe this is mostly semantic, though in some cases can have large effects on performance (for instance, using map() instead of a for loop to apply int()).

I submitted the following code as a solution to Project Euler's #12. It's extremely slow, running in mean 7.13 seconds over 100 trials (I think; just using other submitted solutions as a benchmark. It seems as though solutions <1 seconds are generally decent.)

def factors(n):    
    return set(reduce(list.__add__, 
                ([i, n//i] for i in range(1, int(n**0.5) + 1) if n % i == 0)))

def triangle_number(n):
    for i in range(n):
        n += i  
    return n

n = 1
while True:
    if len(factors(triangle_number(n))) >= 500:
        print n, triangle_number(n)   
        break
    else:
        n +=1


The code for the factors function was borrowed from here.

Solution

This is an addition to Caridorc's answer.

The usual reason for using itertools is that you don't want to construct lists when you don't need to. For example, in Python 2, you'd write for i in xrange(100) instead of for i in range(100) so that you don't have to construct a whole list of 100 numbers when you don't need it. If this doesn't make sense to you, familiarize yourself with the difference between generator expressions (genexprs) and list types.

In your case, however, there's a much more serious problem. You've got a Shlemiel the painter's algorithm!

When you write reduce(list.__add__, list_of_lists), you're essentially writing list_1 + ... + list_n. Python will evaluate this as follows:

  • Take list_1 and list_2. Create a new list object of the proper size, and copy the elements from lists 1 and 2 into this new list. Call it intermediate_1.



  • Take intermediate_1 and list_3. Create a new list of the proper size, and copy the elements from these two lists into the new list. Call it intermediate_2.






  • Take intermediate_(n-1) and list_n. Create a new list of the proper size, and copy the elements from these two lists into the new list. This is the final result.



Do you see the problem? This code is quadratic in both runtime and memory usage.

Here's some hard data. Consider the following function:

def fn(n):
    lists = [[0] * 10000 for i in xrange(n)]
    combined = reduce(list.__add__, lists)


I ran this for values of n of 1, 10, 100, 200, 250, 300, 400, and 500 (using IPython's %timeit fn(100), which does proper timing), and got the following results:

This definitely looks quadratic, and the coefficient of determination is R2 = 0.99655, which indicates an extremely strong correlation.

When we instead use

def fast(n):
    combined = list(itertools.chain.from_iterable(
        [0] * 10000 for i in xrange(n)))


the time for fast(500) is just 65.8 ms instead of 7.42 s. (This shows that it is indeed the Shlemiel factor and not just large lists that's causing the slowdown.)

So, tl;dr, yes, use iterators—but make sure you know why!

So when use reduce?

Just a review of what reduce does for binary operators:

reduce(int.__add__, [x1, x2, x3], x0) = (((x0 + x1) + x2) + x3)


This is extended to general functions as follows:

reduce(f, [x1, x2, x3], x0) = f(f(f(x0, x1), x2), x3)


You should use reduce when you want to write code that looks like either of the above examples. Basically, reduce just saves you a loop.

Conceptually, you use reduce when you want to collapse a bunch of things into one value.

It may help to think of reduce in comparison with its other core functional operations:

  • Use map when you want to transform a bunch of values to other things (like "reverse all these strings"). You use map in list comprehensions when you write [s[::-1] for s in strings].



  • Use filter when you want to take a bunch of values and keep just some of them (like "take just the strings that are palindromes"). You use reduce in list comprehensions when you write [s for s in strings if s == s[::-1]].



  • Use reduce when you want to collapse a bunch of things into one value (like "concatenate all these strings"). You can't use reduce in list comprehensions, because you're not creating a new list (just a value).



Notes:

  • The problem in your case is that the version with list_1 + ... + list_n is just as bad in terms of efficiency. reduce wasn't your problem; it was what reduce expanded to.



  • With reduce, you can omit the third argument x0, and reduce will use the first element of the sequence, or raise a TypeError if the sequence is empty.



Before noting some examples, it may be helpful to realize that some operations from __builtins__ are just reduce under the hood. For example:

  • sum(ints) = reduce(int.__add__, ints, 0);



  • all(bools) = reduce(lambda x, y: x and y, bools, True), and, similarly;



  • any(bools) = reduce(lambda x, y: x or y, bools, False);



  • min(ints) is like reduce(lambda x, y: x if x



  • max(ints) is like reduce(lambda x, y: x if x > y else y, ints)`.



A few quick examples:

def factorial(n):
    return reduce(int.__mul__, xrange(1, n + 1), 1)

def replay_chess_game(moves):
    return reduce(lambda board, move: board.then(move),
                  moves,
                  INITIAL_BOARD)

def perform_git_rebase(commits):
    return reduce(lambda head, commit: head.cherry_pick(commit),
                  commits,
                  git_globals.refs.HEAD)


I don't use reduce all that much in Python, but you should definitely know when to use it.

Code Snippets

def fn(n):
    lists = [[0] * 10000 for i in xrange(n)]
    combined = reduce(list.__add__, lists)
def fast(n):
    combined = list(itertools.chain.from_iterable(
        [0] * 10000 for i in xrange(n)))
reduce(int.__add__, [x1, x2, x3], x0) = (((x0 + x1) + x2) + x3)
reduce(f, [x1, x2, x3], x0) = f(f(f(x0, x1), x2), x3)
def factorial(n):
    return reduce(int.__mul__, xrange(1, n + 1), 1)

def replay_chess_game(moves):
    return reduce(lambda board, move: board.then(move),
                  moves,
                  INITIAL_BOARD)

def perform_git_rebase(commits):
    return reduce(lambda head, commit: head.cherry_pick(commit),
                  commits,
                  git_globals.refs.HEAD)

Context

StackExchange Code Review Q#105202, answer score: 10

Revisions (0)

No revisions yet.