HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonModerate

Project Euler 407: Is there any more optimal way to solve this idempotent equation (modulo n ring)?

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thisoptimalanymorewayprojecteulersolveidempotentmodulo

Problem

Project Euler problem 407:


If we calculate a2 mod 6 for 0 ≤ a ≤ 5 we get: 0, 1, 4, 3, 4, 1.


The largest value of a such that a2 mod 6 = a is 4.
Let's call M(n) the largest value of a 2 mod n = a.
So M(6) = 4.


Find ΣM(n) for 1 ≤ n ≤ 107.

So far, this is what I have:

import time
start = time.time()
from math import sqrt

def primes1(n):
    """ Returns  a list of primes n:  
            break
    return True

for n in xrange(2,10000000):

    if factor1(n):

        tot+=1
    else:
        for a in xrange(int(sqrt(n))+1,n/2+1):
            if (a*a)%n==a:

                tot+=n+1-a
                break

print tot
print time.time()-start


I've tried this code for smaller cases and it works perfectly; however, it is way too slow to do 10^7 cases.

Currently, for n being less than 20000, it runs in about 8 seconds.
When n is less than 90000, it runs in about 150 seconds.

As far as I can tell, for n is less than 10^7, it will run for many hours if not days.

I'm already using the sieve to generate prime numbers so that part is as fast as it can be, is there anything I can do to speed up the rest of the code?

I've already tried using different compiler like psyco, pypy, and shedskin. Psyco provides a minimal increase, shedskin speeds it up about 7 times but creates errors when large numbers occur, pypy speeds it up the most (about 20-30x the speed). But even then, it's still not fast enough for the amount of cases it has to go through.

Solution


  1. Introduction



The feature of Project Euler that makes it interesting is that solving the problems is rarely a matter of writing down a straightforward algorithm and then optimizing it piecewise until it runs in the available time.

Instead, the problems are deliberately designed with parameters large enough that the straightforward algorithm will take many hours or days, but if you can gain some insight into the mathematical structure of the problem, then you can find a better algorithm that radically improves the runtime.

So if you're still at the "hours or days" stage then it's no good trying to speed up the code you've got. You might be able to speed it up by a factor of five or ten by piecewise optimizations, but probably no more than that. No, I'm afraid we need to use ... math.
  1. Getting started



But what if you've already applied all the math you know, and you're too impatient to read a number theory textbook? How do you figure out where to look?

First, it's worth working your way through the Project Euler problems at least approximately in order, rather than diving straight in to the latest problems. The earlier problems often introduce key concepts (for example, Euler's totient function and how to compute it simultaneously for many numbers by sieving) in straightforward contexts, so that when in later and harder problems these concepts are needed, you'll already be familiar with them.

Second, many Project Euler problems involve at their heart a sequence of numbers. You can very often get useful clues about this sequence by computing the first few numbers in the sequence and then searching for them in the On-Line Encyclopedia of Integer Sequences.

In the case of problem 407, the first few elements of the sequence are 0, 1, 1, 1, 1, 4, 1, 1, 1, 6, and this is sequence A182665, which contains a link to this answer by Robert Israel on math.stackexchange.com.
  1. Computing the sequence



(This follows Robert Israel's presentation, but with the notation changed to match the Project Euler problem.)

If \$a^2 \bmod n = a\$, then \$a(a − 1) \bmod n = 0\$.

So if \$n\$ can be factored as \$p_1^{e_1} p_2^{e_2} \ldots\$ then each \$p_i^{e_i}\$ must divide either \$a\$ or \$a − 1\$ (because each \$p_i\$ can divide into at most one of \$a\$ and \$a − 1\$).

Therefore \$n = uv\$, where \$u\$ and \$v\$ are coprime and \$u\$ divides into \$a\$ and \$v\$ divides into \$a − 1\$.

Let \$a = uw\$. Since \$a

  • For each integer \$n\$:



  •   if \$n\$ is a prime or prime power, then \$a = 1\$;



  •   otherwise \$n\$ is a product of two or more distinct primes, so:



  •     for each way to factorize \$n = uv\$ with \$u\$, \$v\$ coprime:



  •       find \$w = u^{−1} \bmod v\$.



  •     \$a\$ is the maximum of \$uw\$ for all such \$uv\$.



Which might just be doable in the time available.

Now at step 6 we have to find the multiplicative inverse of \$u\$ modulo \$v\$. There are a couple of algorithms for finding the modular multiplicative inverse: normally we would use the extended Euclidean algorithm, but here the method using Euler's theorem is attractive because we know that \$u\$ is coprime to \$v\$. So \$u^{−1} = u^{φ(v)−1} \pmod v\$ where \$φ\$ is Euler's totient function.

Since we are sieving for the factorizations, we can sieve for the totients at the same time.
  1. Python code



This runs in about 2½ minutes on my computer under pypy, so I didn't manage to squeeze it inside Project Euler's one-minute time limit. But at this point we're within sight of the goal, so it might be possible to get there by carrying out piecewise optimizations.

```
from collections import defaultdict
from functools import reduce
from itertools import combinations
from operator import mul

def factorize(n):
"""Factorize the numbers up to n, returning a pair (factorization,
totient).

factorization is a dictionary mapping each composite i below n to
a list of numbers p ** e where p is a prime and e is the exponent
of p in the factorization of i.

totient is a list whose i'th element is Euler's totient function
applied to i (the count of numbers less than i which are coprime
to i).

>>> from pprint import pprint
>>> f, t = factorize(10)
>>> pprint((dict(f), t))
({4: [4], 6: [2, 3], 8: [8], 9: [9], 10: [2, 5]},
[0, 1, 1, 2, 2, 4, 2, 6, 4, 6, 4])

"""
n = n + 1
f = defaultdict(list)
t = list(range(n))
for p in range(2, n):
if p not in f:
t[p] = p - 1
for i in range(p + p, n, p):
j, k = i, 1
while j % p == 0:
j //= p
k *= p
f[i].append(k)
t[i] = t[i] * (p - 1) // p
return f, t

def problem407(n):
"""Generate a sequence whose i'th element is the largest a < i such
that a**2 == a (mod i), where i runs from 1 to n (inclusive).

"""
f, t = factorize(n + 1)
yield 0 # i = 1
for i in range(2, n + 1):

Code Snippets

from collections import defaultdict
from functools import reduce
from itertools import combinations
from operator import mul

def factorize(n):
    """Factorize the numbers up to n, returning a pair (factorization,
    totient).

    factorization is a dictionary mapping each composite i below n to
    a list of numbers p ** e where p is a prime and e is the exponent
    of p in the factorization of i.

    totient is a list whose i'th element is Euler's totient function
    applied to i (the count of numbers less than i which are coprime
    to i).

        >>> from pprint import pprint
        >>> f, t = factorize(10)
        >>> pprint((dict(f), t))
        ({4: [4], 6: [2, 3], 8: [8], 9: [9], 10: [2, 5]},
         [0, 1, 1, 2, 2, 4, 2, 6, 4, 6, 4])

    """
    n = n + 1
    f = defaultdict(list)
    t = list(range(n))
    for p in range(2, n):
        if p not in f:
            t[p] = p - 1
            for i in range(p + p, n, p):
                j, k = i, 1
                while j % p == 0:
                    j //= p
                    k *= p
                f[i].append(k)
                t[i] = t[i] * (p - 1) // p
    return f, t

def problem407(n):
    """Generate a sequence whose i'th element is the largest a < i such
    that a**2 == a (mod i), where i runs from 1 to n (inclusive).

    """
    f, t = factorize(n + 1)
    yield 0 # i = 1
    for i in range(2, n + 1):
        if i not in f or len(f[i]) < 2:
            # prime or prime power
            yield 1
            continue
        def uw():
            for j in range(1, len(f[i])):
                for c in combinations(f[i], j):
                    u = reduce(mul, c)
                    v = i // u
                    w = pow(u, t[v] - 1, v)
                    yield u * w
        yield max(uw())
w = pow(u, -1, v)

Context

StackExchange Code Review Q#25919, answer score: 16

Revisions (0)

No revisions yet.