patterncppMinor

Sieve of Eratosthenes - segmented to increase speed and range

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

rangesieveincreasesegmentedandspeederatosthenes

Problem

A lot of the sieve code floating around here and on the 'net is sloppy or broken; in fact, I've yet to see a single one that is correct, clean (in the sense of avoiding superfluous iterations etc.) and that can actually sieve up to the end of its operating range. Some state at least that they cannot sieve the full range, some state limits and crash anyway even for ranges below that, the rest simply crash or return wrong results. Even the 'canonical' examples - Achim Flammenkamp's code, Tomás Oliveira e Silva's and Kim Walisch's @ primesieve.org - range over the whole spectrum in that regard.

That's why I thought it would be a good idea to have a simple, robust example that is correct and flexible enough to serve as a basis for people to experiment with their own ideas. The code I'm proposing here (full code in zrbj_sx_sieve64_v1.cpp) demonstrates two different uses of segmentation with the Sieve of Eratosthenes.

The first use of segmentation is to avoid unnecessary work when the start of the range to be sieved is higher than the square root of the upper end of the range, like in SPOJ problem #2 PRIME1. Sieving all the way up to the upper end of the range (10^9 for PRIME1) can easily exceed the time limit of 5 seconds; by contrast, sieving only the potential factors up to the square root and then the actual target range is several thousand times as fast. This is pretty much the only way of sieving close to 2^64 without taking a long holiday... The second use of segmentation is for speeding up the sieving of bigger ranges but that is explained below, with the code for extending the factor sieve.

The code here uses a packed, odds-only bitmap where set bits signify 'composite', and it implements a 32-bit capable version that limits sieving ranges - and index variables - to at most 2^32 numbers in one go. This limits the sieved range to 2^31 bits per call but that is already more than enough since smaller segments work better even at the upper end of the range (L2/L3

Solution

Generally speaking, I have to admit that I don't understand everything the code is doing and that I could use a little more time in order to understand it. However, I hope I can still give you some hopefully useful pieces of advice:

-
You could use a little more C++ in your code, which could be considered a bit too C-ish. What comes to my mind is BITS_PER_WORD which can actually be obtained with std::numeric_limits:

enum { BITS_PER_WORD = std::numeric_limits::digits };

The same goes for several other constants such as UINT32_MAX which could be std::numeric_limits::max(). I have to admit that it makes the code longer too, but it may be easier to change the integer type with a basic find-and-replace if you ever need to.

-
You could also use the C++ algorithms instead of the C library. For example, replace std::memset by std::fill_n, which is likely to be optimized back into an std::memset anyway while being clearer:

std::fill_n(target_segment, std::size_t((max_bit + CHAR_BIT) / CHAR_BIT), 0);

-
It is likely that it has already been optimized by the compiler but since you compute both index / BITS_PER_WORD and index % BITS_PER_WORD, you may as well make even more sure that these two values are computed together by using std::div. It also applies to bits_to_sieve % segment_size and bits_to_sieve / segment_size somewhere else in the program.

-
If you have access to a C++11 compiler, don't hesitate to declare some variables constexpr, like SIEVE_BITS which can totally be computed at compile time.

Code Snippets

enum { BITS_PER_WORD = std::numeric_limits<word_t>::digits };

std::fill_n(target_segment, std::size_t((max_bit + CHAR_BIT) / CHAR_BIT), 0);

Context

StackExchange Code Review Q#69791, answer score: 9

Revisions (0)

No revisions yet.