patternpythonMinor
Factors sieve optimization
Viewed 0 times
factorsoptimizationsieve
Problem
I've written a more powerful version of the Sieve of Eratosthenes, this Sieve returns the number of factors of all the numbers below the
limit.Sadly the running time is \$O(n^2)\$, so it is much slower than the ordinary sieve. Can you suggest a way to improve and optimize it?import doctest
def factors_sieve(limit):
"""
Returns the number of factors of all the numbers below the limit.
For better convenience use list(enumerate(factors_sieve(limit)))
>>> list(enumerate(factors_sieve(9)))
[(0, 0), (1, 1), (2, 2), (3, 2), (4, 3), (5, 2), (6, 4), (7, 2), (8, 4)]
>>> factors_sieve(100000).count(2) # How many prime numbers below 100000?
9592
"""
sieve = [2]*limit
sieve[0],sieve[1] = 0,1
for number,number_of_divisors in enumerate(sieve):
if number >= 2:
for multiple in range(number*2, len(sieve), number):
sieve[multiple] += 1
return sieve
if __name__ == "__main__":
doctest.testmod()Solution
The running time is not \$ O(n^2) \$. It is \$ O(n \log n) \$. Why?
$$ T(n) = O \bigg( { n - 2 · 2 \over 2 } + { n - 2 · 3 \over 3 } + \dotsb + { n - 2 · { n \over 2 } \over { n \over 2 }} \bigg) = O(n \log n). $$ This is very similar to the sum of a harmonic series. So it is pretty good in terms of time complexity.
It is possible to optimize the code a little bit:
I changed the loop over the enumerated list to a loop over a range of divisors and removed the
I also gave the variables more meaningful names:
But either way, this code shows a good scalability: it runs for about 1.8–1.9 seconds with my optimizations, and 2.1–2.2 seconds without them, for
$$ T(n) = O \bigg( { n - 2 · 2 \over 2 } + { n - 2 · 3 \over 3 } + \dotsb + { n - 2 · { n \over 2 } \over { n \over 2 }} \bigg) = O(n \log n). $$ This is very similar to the sum of a harmonic series. So it is pretty good in terms of time complexity.
It is possible to optimize the code a little bit:
divisors_count = [2] * limit
divisors_count[0], divisors_count[1] = 0, 1
min_divisor = 2
max_divisor = len(divisors_count) // 2
for divisor in range(min_divisor, max_divisor + 1):
for multiple in range(divisor * 2, len(divisors_count), divisor):
divisors_count[multiple] += 1
return divisors_countI changed the loop over the enumerated list to a loop over a range of divisors and removed the
if statement inside the loop's body. In my opinion, it also makes the logic more clear because we do not need a list itself in the outer loop. It is also sufficient to iterate only up to len(sieve) // 2, because otherwise range(number * 2, len(sieve), number) is empty.I also gave the variables more meaningful names:
divisor instead of number and so on to make the code easier to understand (but the initial version is rather clear, too, in my opinion).But either way, this code shows a good scalability: it runs for about 1.8–1.9 seconds with my optimizations, and 2.1–2.2 seconds without them, for
limit = 10 ** 6.Code Snippets
divisors_count = [2] * limit
divisors_count[0], divisors_count[1] = 0, 1
min_divisor = 2
max_divisor = len(divisors_count) // 2
for divisor in range(min_divisor, max_divisor + 1):
for multiple in range(divisor * 2, len(divisors_count), divisor):
divisors_count[multiple] += 1
return divisors_countContext
StackExchange Code Review Q#82761, answer score: 5
Revisions (0)
No revisions yet.