HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMajor

Sieve of Eratosthenes - Python

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
pythonsieveeratosthenes

Problem

I've been doing a lot of Project Euler lately and just wanted to make sure my implementation was as good as it could be. Does anyone have any suggestions to speed this up?

def sieve(upperlimit):
    # mark off all multiples of 2 so we can use 2*p as the step for the inner loop
    l = [2] + [x if x % 2 != 0 else 0 for x in range(3, upperlimit + 1)]

    for p in l:
        if p ** 2 > upperlimit:
            break
        elif p:
            for i in range(p * p, upperlimit + 1, 2 * p):
                l[i - 2] = 0
    # filter out non primes from the list, not really that important i could work with a list full of zeros as well
    return [x for x in l if x]

Solution

-
Here's my starting point for computing the performance improvement due to the various revisions below: how long does it take to sieve for the prime numbers below \$ 10^8 \$?

>>> from timeit import timeit
>>> test = lambda f: timeit(lambda:f(10**8), number=1)
>>> t1 = test(sieve)


The exact number is going to depend on how fast your computer is, so I'm going to compute performance ratios, but for the record, here it is:

>>> t1
78.9875438772142


-
Your initialization of the list l takes more than half the time, so let's try a cheaper approach. Let's also give this array a better name, and make it a Boolean array while we're about it.

def sieve2(n):
    """Return a list of the primes below n."""
    prime = [True] * n
    for p in range(3, n, 2):
        if p ** 2 > n:
            break
        if prime[p]:
            for i in range(p * p, n, 2 * p):
                prime[i] = False
    return [2] + [p for p in range(3, n, 2) if prime[p]]


When optimizing a function like this, it's always worth keeping the un-optimized version around to check the correctness of the optimized version:

>>> sieve(106) == sieve2(106)
True


This already runs in less than a third of the time:

>>> test(sieve2) / t1
0.30390444573149544


-
We could avoid the test for p 2 > n by computing a tighter limit for the loop. Note that I've used n .5 here as this is slightly faster than math.sqrt(n).

def sieve3(n):
    """Return a list of the primes below n."""
    prime = [False, False, True] + [True, False] * (n // 2)
    for p in range(3, int(n ** .5) + 1, 2):
        if prime[p]:
            for i in range(p * p, n, 2 * p):
                prime[i] = False
    return [p for p in range(2, n) if prime[p]]


This makes little difference to the overall runtime:

>>> test(sieve3) / t1
0.2971086436068156


-
We can accumulate the result as we go, instead of in a separate iteration at the end. Note that I've cached result.append in a local variable to avoid looking it up each time round the loop.

def sieve4(n):
    """Return a list of the primes below n."""
    prime = [False, False, True] + [True, False] * (n // 2)
    result = [2]
    append = result.append
    sqrt_n = (int(n ** .5) + 1) | 1    # ensure it's odd
    for p in range(3, sqrt_n, 2):
        if prime[p]:
            append(p)
            for i in range(p * p, n, 2 * p):
                prime[i] = False
    for p in range(sqrt_n, n, 2):
        if prime[p]:
            append(p)
    return result


Again, this makes very little difference:

>>> test(sieve4) / t1
0.286016401170129


-
We can use Python's slice assignment instead of a loop when setting the sieve entries to False. This looks wasteful since we create a large list and then throw it away, but this avoids an expensive for loop and the associated Python interpreter overhead.

def sieve5(n):
    """Return a list of the primes below n."""
    prime = [True] * n
    result = [2]
    append = result.append
    sqrt_n = (int(n ** .5) + 1) | 1    # ensure it's odd
    for p in range(3, sqrt_n, 2):
        if prime[p]:
            append(p)
            prime[p*p::2*p] = [False] * ((n - p*p - 1) // (2*p) + 1)
    for p in range(sqrt_n, n, 2):
        if prime[p]:
            append(p)
    return result


This gives a small but noticeable improvement:

>>> test(sieve5) / t1
0.2617646381557855


-
For big improvements to the performance of numerical code, we can use NumPy.

import numpy

def sieve6(n):
    """Return an array of the primes below n."""
    prime = numpy.ones(n, dtype=numpy.bool)
    prime[:2] = False
    prime[4::2] = False
    sqrt_n = int(n ** .5) + 1
    for p in range(3, sqrt_n, 2):
        if prime[p]:
            prime[p*p::2*p] = False
    return prime.nonzero()[0]


This is more than 6 times as fast as sieve5, and more than 25 times as fast as your original code:

>>> test(sieve6) / t1
0.03726392181902129


-
We could avoid allocating space for the even numbers, improving memory locality:

def sieve7(n):
    """Return an array of the primes below n."""
    prime = numpy.ones(n // 2, dtype=numpy.bool)
    sqrt_n = int(n ** .5) + 1
    for p in range(3, sqrt_n, 2):
        if prime[p // 2]:
            prime[p*p // 2::p] = False
    result = 2 * prime.nonzero()[0] + 1
    result[0] = 2
    return result


>>> test(sieve7) / t1
0.029220096670965198


-
And finally, an implementation that sieves separately for primes of the form \$ 6i − 1 \$ and \$ 6i + 1 \$, due to Robert William Hanks:

def sieve8(n):
    """Return an array of the primes below n."""
    prime = numpy.ones(n//3 + (n%6==2), dtype=numpy.bool)
    for i in range(3, int(n**.5) + 1, 3):
        if prime[i // 3]:
            p = (i + 1) | 1
            prime[       p*p//3     ::2*p] = False
            prime[p*(p-2*(i&1)+4)//3::2*p] = False
    result = (3 * prime.nonzero()[0] + 1) | 1
    result[0] = 3
    return numpy.r_[2,result]


This

Code Snippets

def sieve2(n):
    """Return a list of the primes below n."""
    prime = [True] * n
    for p in range(3, n, 2):
        if p ** 2 > n:
            break
        if prime[p]:
            for i in range(p * p, n, 2 * p):
                prime[i] = False
    return [2] + [p for p in range(3, n, 2) if prime[p]]
def sieve3(n):
    """Return a list of the primes below n."""
    prime = [False, False, True] + [True, False] * (n // 2)
    for p in range(3, int(n ** .5) + 1, 2):
        if prime[p]:
            for i in range(p * p, n, 2 * p):
                prime[i] = False
    return [p for p in range(2, n) if prime[p]]
def sieve4(n):
    """Return a list of the primes below n."""
    prime = [False, False, True] + [True, False] * (n // 2)
    result = [2]
    append = result.append
    sqrt_n = (int(n ** .5) + 1) | 1    # ensure it's odd
    for p in range(3, sqrt_n, 2):
        if prime[p]:
            append(p)
            for i in range(p * p, n, 2 * p):
                prime[i] = False
    for p in range(sqrt_n, n, 2):
        if prime[p]:
            append(p)
    return result
def sieve5(n):
    """Return a list of the primes below n."""
    prime = [True] * n
    result = [2]
    append = result.append
    sqrt_n = (int(n ** .5) + 1) | 1    # ensure it's odd
    for p in range(3, sqrt_n, 2):
        if prime[p]:
            append(p)
            prime[p*p::2*p] = [False] * ((n - p*p - 1) // (2*p) + 1)
    for p in range(sqrt_n, n, 2):
        if prime[p]:
            append(p)
    return result
import numpy

def sieve6(n):
    """Return an array of the primes below n."""
    prime = numpy.ones(n, dtype=numpy.bool)
    prime[:2] = False
    prime[4::2] = False
    sqrt_n = int(n ** .5) + 1
    for p in range(3, sqrt_n, 2):
        if prime[p]:
            prime[p*p::2*p] = False
    return prime.nonzero()[0]

Context

StackExchange Code Review Q#42420, answer score: 34

Revisions (0)

No revisions yet.