principleMinor

Sieve of Eratosthenes vs. Sieve of Sundaram

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

sundaramsieveeratosthenes

Problem

Relevant Information:

Sieve of Eratosthenes

Sieve of Sundaram

Suppose I want to generate all primes in [2,n], and I have both of these algorithms at my disposal to get the job done. Which is preferable under what conditions?

I read that Sundaram runs in O(n log n) time, whereas Eratosthenes runs in O(n log log n) time, so it seems that Eratosthenes is preferable. However, that is just a very superficial evaluation. Are there other factors (aside from ease of implementation) to be considered? Which is the 'better' algorithm?

Solution

Although this question is old, the considerations are still relevant and there is still some lack of proper analysis on which prime sieve to choose for various purposes. In this evaluation, it is important to note that these intense considerations are only necessary for ranges of primes in the order of a hundreds of millions or more: A simple trial division algorithm can be sufficient for a range of up to about a million and just about any properly optimized sieve can sieve to a billion in a few seconds to tens of seconds in just about any language that is not interpreted (compiled or Just-In-Time - JIT - compiled to native code). It is large ranges of a billion an up where things get interesting in that the amount of memory used becomes a consideration such that it may be a requirement to implement page segmentation so as to be able to efficiently use the available memory.

First, note that there are three considerations on what makes a particular sieve fast, as follows:

The number of operations and how these operations increase for increasing ranges for a particular sieve, which latter evaluation is what is indicated by the big-O value for asymptotic execution complexity for large ranges.

The complexity of each individual culling operation as in machine cycles per operation which usually isn't considered in given big-O functions as per point 1.

The ability to add page segmentation without impacting excessively on the above two requirements, which segmentation is necessary to reduce memory requirements for large ranges so the sieve can be run at all, to achieve better cache associativity which otherwise will have a (large) negative impact on point 2 above.

Many mistakenly think that the best big-O performance is the best algorithm, but that is an error in analysis: An algorithm may have the best big-O performance as in (say) O(N) for sieving (where N is the sieving range) but require either many more operations or such complex operations that the execution time for each operation is so high that the total time to execute to a given range may be much higher than for an algorithm with less operations/complexity such that it can never catch up to an algorithm with worse big-O performance for practical achievable sieving ranges; thus, it is usually not the best or only consideration for comparison and is more useful for predicting the performance of a given algorithm with increasing range as compared to itself.

The main three contenders are usually the following with the given big-O performance characterizations following:

The Sieve of Eratosthenes (SoE) - O(N log(log (N)))

The Sieve of Sundaram (SoS) - O(N log(N))

The Sieve of Atkin (SoA) - O(N) (i've added this one as it is usually considered...)

Let's consider these in order:

-
First, it is pointless to implement the SoE without including the very simple "odds-only" optimization that reduces the number of operations by a factor of about two and a half to about the same number of culling operations as the sieving range. However, there is no reason to stop there, and with a little added complexity in using the maximally wheel factorized "combo sieve" as per the Wikipedia article, one can reduce this to about a quarter of the sieving range for ranges of about a billion to about a third of the sieving range for sieving ranges of about a trillion. Thus, with maximum wheel factorization the number of culling operations aren't much more than the best of the other algorithms. The majority of the inner culling loop(s) in which the SoE spends most of its time can be reduced to a very simple native code single clock cycle read/modify/write instruction with maximum optimization techniques so that even when averaged across supporting code and parts of the code that can't use this optimization, the average is still only two to three CPU clock cycles per culling cycle for ranges of trillions. While it's true that operations get gradually a little slower on the average for larger ranges, they increase relatively slower than the practical application of the other algorithms.

-
The unmodified SoS algorithm can never approach the performance of other sieves due to not specifying the proper limits for the i and j variables, which can easily be limited so that the culling function i + j + 2ij never exceeds one half of the culling range value as larger values exceed the sieve buffer range. However, the SoS still has many more operations than the "odds-only" SoE as it culls by the factors of all odd values rather than only the prime factors as for the SoE, which is why it has the O(N (log N)) performance rather than O(N ((log (log N))) performance; this makes a significant difference in the number of culling operations for culling ranges of a billion and above, and while the number of operations can be reduced by such techniques as pre-culling for small values of factor, this optimization (and more) can also be applied to the SoE. Now, i

Context

StackExchange Computer Science Q#19115, answer score: 7

Revisions (0)

No revisions yet.