patterncsharpMinor
A reference source for primes up to 64K (for unit tests)
Viewed 0 times
reference64kprimessourcetestsforunit
Problem
In order to code the tests for my number theory library (a collection of routines that proved handy for coding challenges) I needed a reference source for primes up to 2^16.
I did not want to reference code in other modules to get those primes because I wanted the buck to stop right there instead of passing it on, and I did not want to include the 6542 primes bodily in the source code. Apart from the bloat, this would necessitate a test for those included primes (in case of an editing accident that trashes something) and the approach would be completely impractical for bigger ranges, like primes up to 2^32.
Writing a small sieve to get the primes is no problem, but then who verifies the correctness of that sieve code? Hen and egg and all that. So I decided that the reference source would be a text file that can be verified using whatever means and write-protected. If the file doesn't exist yet then it gets generated and a message is printed to that effect, so that the external verification can take place. Also, the class keeps nagging as long as the file is not read-only, since that presumably means it hasn't been verified yet.
Note: the static function
Final note:
```
public class ReferencePrimesUpTo64K
{
public static string ReferenceFile = "reference_primes_up_to_64k.txt";
public static ushort[] Primes;
static ReferencePrimesUpTo64K ()
{
if (!System
I did not want to reference code in other modules to get those primes because I wanted the buck to stop right there instead of passing it on, and I did not want to include the 6542 primes bodily in the source code. Apart from the bloat, this would necessitate a test for those included primes (in case of an editing accident that trashes something) and the approach would be completely impractical for bigger ranges, like primes up to 2^32.
Writing a small sieve to get the primes is no problem, but then who verifies the correctness of that sieve code? Hen and egg and all that. So I decided that the reference source would be a text file that can be verified using whatever means and write-protected. If the file doesn't exist yet then it gets generated and a message is printed to that effect, so that the external verification can take place. Also, the class keeps nagging as long as the file is not read-only, since that presumably means it hasn't been verified yet.
Note: the static function
GetSmallPrimesBetween() is the raison d'être for this class; Primes[] is incidental. The sieve function was ripped from my coding challenge paste library; it has a few flourishes that I would not have added if I had written it from scratch for this purpose, but I left them in because there's no reason to mess with tried and trusted code. Final note:
GetSmallPrimesBetween() takes int parameters instead of ushort, and it is quite lenient regarding their values, as long as they do not result in an invalid request. This is to avoid the hassle of parameter vetting and casting at the call site.```
public class ReferencePrimesUpTo64K
{
public static string ReferenceFile = "reference_primes_up_to_64k.txt";
public static ushort[] Primes;
static ReferencePrimesUpTo64K ()
{
if (!System
Solution
I needed a reference source for primes up to 216... I did not want to include the 6542 primes bodily in the source code. Apart from the bloat,
A list of 6542 primes is not necessarily "bloat". At 20 primes per line, that's 328 LOC (lines of code) with zero complexity, compared to what you ended up writing, which took only 62 LOC but has relatively high computational complexity and dependencies on the filesystem, heap allocation, and so on.
this would necessitate a test for those included primes (in case of an editing accident that trashes something)
Use source control, such as
and the approach would be completely impractical for bigger ranges, like primes up to 2^32.
If you want arbitrarily large lists of primes, then yes, at some point it becomes silly to hard-code them all. In that case you're going to need a reference implementation anyway. So, why bother with hard-coding the shorter list? Just use your reference implementation for both cases. This way you don't need two implementations.
But my point is that you don't need any of that filesystem logic to begin with. Computing the 6542 primes from 2 to 65521 should take you about a millisecond — quite possibly less time than it would take you to open and read your file on disk!
So let's discard all that filesystem code. That leaves us with the only interesting piece of your code: the prime-computation reference implementation.
If you mean "65535", say "65535". Don't try to confuse the reader by making him look up what
These are constants, aren't they? If you mean "127", say "127". If you mean "32767", say "32767". Don't try to make the reader do math.
This code is much more complicated than it ought to be, and wastes O(N) memory. Have you benchmarked it, compared to the naive solution? Which, for reference, would be something like
I suspect you'll find that this simple code is approximately as fast as your complicated file-handling code — by which I mean, you'll never notice any difference in runtime in practice.
A list of 6542 primes is not necessarily "bloat". At 20 primes per line, that's 328 LOC (lines of code) with zero complexity, compared to what you ended up writing, which took only 62 LOC but has relatively high computational complexity and dependencies on the filesystem, heap allocation, and so on.
this would necessitate a test for those included primes (in case of an editing accident that trashes something)
Use source control, such as
git or svn; then your test for "editing accidents" can just be "is the source code up to date". If you're worried about editing accidents in the past, take a look at the history of the source file.and the approach would be completely impractical for bigger ranges, like primes up to 2^32.
If you want arbitrarily large lists of primes, then yes, at some point it becomes silly to hard-code them all. In that case you're going to need a reference implementation anyway. So, why bother with hard-coding the shorter list? Just use your reference implementation for both cases. This way you don't need two implementations.
ReferencePrimesUpTo64K's constructor should not call Console.WriteLine; that's mixing high-level business logic ("when the object can't be constructed successfully, log a message to stdout") with low-level implementation logic ("when the file can't be opened, the object can't be constructed successfully"). What you should do if you can't construct the object successfully is throw an exception; that's how you report failure in C#. Your higher-level code can then catch that exception and log a message to stdout, or try again, or switch to a different algorithm, or pop up an alert box, or terminate, or whatever high-level business logic the high-level author wants to implement.But my point is that you don't need any of that filesystem logic to begin with. Computing the 6542 primes from 2 to 65521 should take you about a millisecond — quite possibly less time than it would take you to open and read your file on disk!
So let's discard all that filesystem code. That leaves us with the only interesting piece of your code: the prime-computation reference implementation.
internal static List small_primes_up_to_64K ()
{
const int n = ushort.MaxValue;If you mean "65535", say "65535". Don't try to confuse the reader by making him look up what
ushort.MaxValue represents. Just write 65535.int sqrt_n_halved = (int)(Math.Sqrt(n) - 1) >> 1, max_bit = (n - 1) >> 1;These are constants, aren't they? If you mean "127", say "127". If you mean "32767", say "32767". Don't try to make the reader do math.
var odd_composite = new bool[max_bit + 1];
for (int i = 5 >> 1; i > 1; j () { 2, 3 }; // mod 3 stepping on top of the mod 2 wheel
for (int i = 5 >> 1, d = 1; i <= max_bit; i += d, d ^= 3)
if (!odd_composite[i])
result.Add((ushort)((i << 1) + 1));This code is much more complicated than it ought to be, and wastes O(N) memory. Have you benchmarked it, compared to the naive solution? Which, for reference, would be something like
internal static bool is_prime(int x)
{
if (x small_primes_between(int m, int n)
{
for (int i = m; i < n; ++i) {
if (is_prime(i)) yield return i;
}
}I suspect you'll find that this simple code is approximately as fast as your complicated file-handling code — by which I mean, you'll never notice any difference in runtime in practice.
Code Snippets
internal static List<ushort> small_primes_up_to_64K ()
{
const int n = ushort.MaxValue;int sqrt_n_halved = (int)(Math.Sqrt(n) - 1) >> 1, max_bit = (n - 1) >> 1;var odd_composite = new bool[max_bit + 1];
for (int i = 5 >> 1; i <= sqrt_n_halved; ++i)
if (!odd_composite[i])
for (int p = (i << 1) + 1, j = p * p >> 1; j <= max_bit; j += p)
odd_composite[j] = true;
var result = new List<ushort>() { 2, 3 }; // mod 3 stepping on top of the mod 2 wheel
for (int i = 5 >> 1, d = 1; i <= max_bit; i += d, d ^= 3)
if (!odd_composite[i])
result.Add((ushort)((i << 1) + 1));internal static bool is_prime(int x)
{
if (x <= 2) return (x == 2); // (EDITED: see comments)
if (x % 2 == 0) return false;
const int limit = Math.Sqrt(x);
for (int i=3; i <= limit; i += 2) {
if (x % i == 0) return false;
}
return true;
}
internal static IEnumerable<int> small_primes_between(int m, int n)
{
for (int i = m; i < n; ++i) {
if (is_prime(i)) yield return i;
}
}Context
StackExchange Code Review Q#127024, answer score: 4
Revisions (0)
No revisions yet.