HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Factors sieve optimization

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
factorsoptimizationsieve

Problem

I've written a more powerful version of the Sieve of Eratosthenes, this Sieve returns the number of factors of all the numbers below the limit.Sadly the running time is \$O(n^2)\$, so it is much slower than the ordinary sieve. Can you suggest a way to improve and optimize it?

import doctest

def factors_sieve(limit):
    """
    Returns the number of factors of all the numbers below the limit.
    For better convenience use list(enumerate(factors_sieve(limit)))

    >>> list(enumerate(factors_sieve(9)))
    [(0, 0), (1, 1), (2, 2), (3, 2), (4, 3), (5, 2), (6, 4), (7, 2), (8, 4)]
    >>> factors_sieve(100000).count(2) # How many prime numbers below 100000?
    9592
    """
    sieve = [2]*limit
    sieve[0],sieve[1] = 0,1
    for number,number_of_divisors in enumerate(sieve):
        if number >= 2:
            for multiple in range(number*2, len(sieve), number):
                sieve[multiple] += 1

    return sieve

if __name__ == "__main__":
    doctest.testmod()

Solution

The running time is not \$ O(n^2) \$. It is \$ O(n \log n) \$. Why?

$$ T(n) = O \bigg( { n - 2 · 2 \over 2 } + { n - 2 · 3 \over 3 } + \dotsb + { n - 2 · { n \over 2 } \over { n \over 2 }} \bigg) = O(n \log n). $$ This is very similar to the sum of a harmonic series. So it is pretty good in terms of time complexity.

It is possible to optimize the code a little bit:

divisors_count = [2] * limit
divisors_count[0], divisors_count[1] = 0, 1
min_divisor = 2
max_divisor = len(divisors_count) // 2
for divisor in range(min_divisor, max_divisor + 1):
    for multiple in range(divisor * 2, len(divisors_count), divisor):
        divisors_count[multiple] += 1
return divisors_count


I changed the loop over the enumerated list to a loop over a range of divisors and removed the if statement inside the loop's body. In my opinion, it also makes the logic more clear because we do not need a list itself in the outer loop. It is also sufficient to iterate only up to len(sieve) // 2, because otherwise range(number * 2, len(sieve), number) is empty.

I also gave the variables more meaningful names: divisor instead of number and so on to make the code easier to understand (but the initial version is rather clear, too, in my opinion).

But either way, this code shows a good scalability: it runs for about 1.8–1.9 seconds with my optimizations, and 2.1–2.2 seconds without them, for limit = 10 ** 6.

Code Snippets

divisors_count = [2] * limit
divisors_count[0], divisors_count[1] = 0, 1
min_divisor = 2
max_divisor = len(divisors_count) // 2
for divisor in range(min_divisor, max_divisor + 1):
    for multiple in range(divisor * 2, len(divisors_count), divisor):
        divisors_count[multiple] += 1
return divisors_count

Context

StackExchange Code Review Q#82761, answer score: 5

Revisions (0)

No revisions yet.