HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Sieve of Eratosthenes: making it quicker

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
makingquickersieveeratosthenes

Problem

I was thinking about doing problem 23 of Project Euler. It includes the difficulty where you have to factorize primes. I had already done that in problems I solved earlier, but it was only necessary to have a prime list until a relative small number.

So I had already implemented the algorithm when I took a look at this problem. I stored numbers in a list, flagged the ones that weren't prime numbers and removed them from the list, al of this in Python. It worked well for small lists, but when I went big, the memory usage was just too much.

I still wanted to use the algorithm of Eratosthenes, but I still needed a solution to my memory issue. So I decided, I don't know if it is the best solution, to write the list to a file and to constantly rewrite this file.

I succeeded in making this algorithm, but it's not really fast. To find primes up to the number 20 000 it took my program 405 seconds, or 6'45".

Is there any way I can improve this implementation? No file rewriting, cashing data in the RAM, etc.?

SieveCash.py (Python 3)

```
import tempfile
import os
import sys

def createSieve(outputFile, maxp, logging=0):
"""Creates a sieve and stores one prime per line in file outputFile."""
# Write a list of all numbers to check to a file
myNumFile = createBeginNumberList(maxp, outputFile)

# Do the looping
rfh = open(myNumFile, "r")
try:
totalRead = 0
for i in range(maxp+1):
# log if necessary
if logging > 0 and i % logging == 0:
print("Checking number", i, "(" + str(int(i/(maxp+1)*100)) + "%)")

# do thingy
bytesRead, myNum = readNumber(rfh)
if bytesRead == 0:
break

totalRead += bytesRead

if myNum != 0:
rfh.close()
rfh = None

rewriteFileWithoutMultiples(myNumFile, myNum, totalRead)

rfh = open(myNumFile, "r")
rfh.seek(totalRead)

Solution

Writing to disk is going to be really slow. Don't do that.

If you are running out of memory, try using the standard array module or the numpy library. Both of those provide efficient arrays for holding large numbers of values. Your only running out of memory because python's lists don't store their values in a memory efficient manner.

If you are running out of memory for some reason even taking that into account, post that code here and we'll be glad to point out how to improve the memory usage.

Context

StackExchange Code Review Q#6477, answer score: 3

Revisions (0)

No revisions yet.