patterncMinor
Bitmap-based Sieve of Erastothenes
Viewed 0 times
sieveerastothenesbasedbitmap
Problem
I have written a simple implementation of Sieve of Erasthotenes (in C, MSVC flavour), which I believe is correct (source below). Is there a way to make it faster and more readable?
#include
#include
#include
#include
typedef unsigned long long bitmap_entry_t;
#define NUMBERS_TO_CALCULATE (1024*1024*1024)
#define BITMAP_ENTRY_BITS (sizeof(bitmap_entry_t)*8)
inline void bitmap_set(bitmap_entry_t *bitmap, unsigned long long index)
{
_bittestandset64(&bitmap[index / BITMAP_ENTRY_BITS], index%BITMAP_ENTRY_BITS);
}
inline void bitmap_unset(bitmap_entry_t *bitmap, unsigned long long index)
{
_bittestandreset64(&bitmap[index / BITMAP_ENTRY_BITS], index%BITMAP_ENTRY_BITS);
}
inline bitmap_entry_t bitmap_get(bitmap_entry_t *bitmap, unsigned long long index)
{
return _bittest64(&bitmap[index / BITMAP_ENTRY_BITS], index%BITMAP_ENTRY_BITS);
}
int main(void)
{
bitmap_entry_t *bitmap = malloc((1 + NUMBERS_TO_CALCULATE / BITMAP_ENTRY_BITS) * sizeof(bitmap_entry_t));
for (int i = 0; i < (1 + NUMBERS_TO_CALCULATE / BITMAP_ENTRY_BITS); i++)
{
bitmap[i] = 0xFFFFFFFFFFFFFFFFLL;
}
for (int i = 2; i < sqrt(NUMBERS_TO_CALCULATE); i++)
{
if (bitmap_get(bitmap, i - 2))
{
for (int j = 2 * i; j < NUMBERS_TO_CALCULATE; j += i)
{
bitmap_unset(bitmap, j - 2);
}
}
}
for (int i = 2; i < NUMBERS_TO_CALCULATE; i++)
{
if (bitmap_get(bitmap, i - 2))
{
printf("%d ", i);
}
}
getchar();
}Solution
I see a few things that might help you improve your code.
Declare local routines
If you declare
Eliminate unused routine
The code defines a routine
Use
You could eliminate the initialization of all of the bits to
Skip all even numbers
Your bitmap could be half the current size by simply eliminating the storage of even numbers. The only even prime is 2, so your program doesn't really need to allocate bits for even numbers.
Optimize the inner loop
If we only test odd numbers, as per the last suggestion, we can also optimize the inner loop. Once we know that
Prefer portable constants
If you elect not to use
It would be shorter, less potentially error prone (quick -- how many F's should there be?) and easier to read like this:
On my machine, using
Consider using pointers
On some compilers, depending on optimizations used, the use of pointers can be faster than using indexing. You could test this on your machine with your compiler. For example, the initialization loop could be written like this:
Optimize printing
As you may already know, it's the number conversion and printing of the results that takes most of the time for this program. If that's of interest, you could write your own special purpose output formatter. By keeping a static string array in memory and using that to increment through the numbers, you may be able to speed the entire program.
Declare local routines
staticIf you declare
bitmap_get and the other routines as static, the compiler will know that they can never be called outside this one file. With that information, much more agressive optimization can be done and the inline keyword will probably not even be needed.Eliminate unused routine
The code defines a routine
bitmap_set but it is not actually used and can be eliminated from the code. Smaller code typically runs faster, so there's a small chance that this could improve speed as well.Use
calloc and reverse the sense of the bitsYou could eliminate the initialization of all of the bits to
1 by using calloc instead of malloc and reversing the meaning of the bits in the bitmap. On my Linux box, the times were about the same, but of course the calloc version was shorter.Skip all even numbers
Your bitmap could be half the current size by simply eliminating the storage of even numbers. The only even prime is 2, so your program doesn't really need to allocate bits for even numbers.
Optimize the inner loop
If we only test odd numbers, as per the last suggestion, we can also optimize the inner loop. Once we know that
i is an odd number, it's obvious that 2*i must be an even number, so there's no reason to mark it as non-prime since we already know that all even numbers except 2 are non-prime. Using both suggestions, we can double the speed of both inner and outer loops by writing them like this:for (int i = 3; i < sqrt(NUMBERS_TO_CALCULATE); i+=2)
{
if (bitmap_get(bitmap, i - 2))
{
for (int j = 3 * i; j < NUMBERS_TO_CALCULATE; j += 2*i)
{
bitmap_unset(bitmap, j - 2);
}
}
}Prefer portable constants
If you elect not to use
calloc, you could still revise the way the constant is written. At the moment, the code uses this:bitmap[i] = 0xFFFFFFFFFFFFFFFFLL;It would be shorter, less potentially error prone (quick -- how many F's should there be?) and easier to read like this:
bitmap[i] = ~0;On my machine, using
gcc as the compiler, they compile to the same thing.Consider using pointers
On some compilers, depending on optimizations used, the use of pointers can be faster than using indexing. You could test this on your machine with your compiler. For example, the initialization loop could be written like this:
for (bitmap_entry_t *p = bitmap;
p < &bitmap[(1 + NUMBERS_TO_CALCULATE / BITMAP_ENTRY_BITS)]; ++p)
{
*p = ~0;
}Optimize printing
As you may already know, it's the number conversion and printing of the results that takes most of the time for this program. If that's of interest, you could write your own special purpose output formatter. By keeping a static string array in memory and using that to increment through the numbers, you may be able to speed the entire program.
Code Snippets
for (int i = 3; i < sqrt(NUMBERS_TO_CALCULATE); i+=2)
{
if (bitmap_get(bitmap, i - 2))
{
for (int j = 3 * i; j < NUMBERS_TO_CALCULATE; j += 2*i)
{
bitmap_unset(bitmap, j - 2);
}
}
}bitmap[i] = 0xFFFFFFFFFFFFFFFFLL;bitmap[i] = ~0;for (bitmap_entry_t *p = bitmap;
p < &bitmap[(1 + NUMBERS_TO_CALCULATE / BITMAP_ENTRY_BITS)]; ++p)
{
*p = ~0;
}Context
StackExchange Code Review Q#85942, answer score: 2
Revisions (0)
No revisions yet.