snippetcsharpMinor
Generate cryptographically secure random numbers in a specific range
Viewed 0 times
randomsecurerangenumbersgeneratecryptographicallyspecific
Problem
A project I'm working on requires generating a random number of \$N\$ length to a very high degree of fair distribution between digits \$[0, 9]\$.
That said, I used the
First, we have the clamping. This clamps the values to the range from \$[lower, upper)\$, allowing the user to specify what range of values they want. For me, it's \$[0, 9]\$.
Mathematically speaking, using a cryptographically secure random number generator should yield a truly random set. The problem is clamping: if we simply take
To reduce, or even eliminate, this bias I took a previously working algorithm and modified it slightly to create a bounded number of loops. The idea is simple: to generate a random value in the domain \$[0, 9]\$ take the random value from the set \$[0, 255]\$ and test that it is within the range \$[0, 249]\$. If it is within that range, take \$value \mod 10\$ as the result. If it is in the range \$[250, 255]\$ then we draw a new number. This can be expanded to define the following general-purpose formula:
$$upperLimit = 256 - (256 \mod modulo)$$
We can then define the number to take the modulo by as:
$$modulo = upper - lower$$
We set the loop up to bound to a specific value, in this case I'm creating the loop
So far, my results have yielded almost perfect distribution of the values \$[0, 9]\$, when generating \$1,000,000\$ values.
I have run this several times, and each time I get a slightly different distribution, by a very small margin. That is perfect, as it demonstrates that the RNG is not predictable by any manner.
```
Value 0: 100090 occurrences (10.009%)
Value 1: 100298 occurrences (10.0298%)
Value 2: 100034 occurrences (10.0034%)
Value 3: 999
That said, I used the
RNGCryptoServiceProvider in the .NET framework, and built my own window restriction / clamping.First, we have the clamping. This clamps the values to the range from \$[lower, upper)\$, allowing the user to specify what range of values they want. For me, it's \$[0, 9]\$.
Mathematically speaking, using a cryptographically secure random number generator should yield a truly random set. The problem is clamping: if we simply take
val % 10 to reduce it to our desired \$[0, 9]\$ range, we'll find that we have a slight favoritism / bias towards numbers at the lower end of the set. In fact, we should have the most bias towards the set \$[0, 5]\$.To reduce, or even eliminate, this bias I took a previously working algorithm and modified it slightly to create a bounded number of loops. The idea is simple: to generate a random value in the domain \$[0, 9]\$ take the random value from the set \$[0, 255]\$ and test that it is within the range \$[0, 249]\$. If it is within that range, take \$value \mod 10\$ as the result. If it is in the range \$[250, 255]\$ then we draw a new number. This can be expanded to define the following general-purpose formula:
$$upperLimit = 256 - (256 \mod modulo)$$
We can then define the number to take the modulo by as:
$$modulo = upper - lower$$
We set the loop up to bound to a specific value, in this case I'm creating the loop
So far, my results have yielded almost perfect distribution of the values \$[0, 9]\$, when generating \$1,000,000\$ values.
I have run this several times, and each time I get a slightly different distribution, by a very small margin. That is perfect, as it demonstrates that the RNG is not predictable by any manner.
```
Value 0: 100090 occurrences (10.009%)
Value 1: 100298 occurrences (10.0298%)
Value 2: 100034 occurrences (10.0034%)
Value 3: 999
Solution
Firstly, related to the mathematical aspect I see how the bias could be leaning towards \$[0, 5]\$, but I don't quite get the connection why you are using 256 afterwards. You could possibly get even better distribution by changing form bytes to the next storage level available.
Depending on your range, the distribution could be affected severly if the \$modulo\$ gets close to 256, if I've understood your algorithm correctly.
I've sadly not gotten a C# compiler available, so the following points are made untested:
-
Edge case never reached, and doesn't generate new number – Within
However, as you describe, this edge case is not met when the
-
No bailing out of
So in general your code looks good, and seems to give a better distribution than the default distribution when reducing the range significantly. But I'm a little unsure about the performance speed and cost if the range is increased, and why you've chosen to return the number as a string. All in all, interesting and well written code!
Depending on your range, the distribution could be affected severly if the \$modulo\$ gets close to 256, if I've understood your algorithm correctly.
I've sadly not gotten a C# compiler available, so the following points are made untested:
- In general your code looks goods – Nice spacing, nice variable names, and so on. Not a whole lot of comments explaining the reasoning for your methods, but in general it looks good. Personally I'm not all in favor of capitalizing both Function Names, like
ClampDigit(), so I kind of read most of your function names as class names, but this is personal preferences.
- Cost of initializing
RNGCryptoServiceProvider? – I'm wondering what the cost of constructing this all the time is. Some random generators are somewhat expensive to initialize, but a lot cheaper to use. There could be a possible gain of creating this once, and reusing it later on.
- Multiple call with same parameters -> Class? – You call the
ClampDigit()quite a few times with the same parameters. Not expensive as such, but possibly it could be better to have a class where you reuse the instantiation for new random digits with the same parameters?
- Why does
RandomNumber()return astring? – It kind of baffles me that theRandomNumber()returns a string, and not an actual number. It would make more sense that it actually returned a number.
-
Edge case never reached, and doesn't generate new number – Within
ClampDigit() your description says you should generate a new number, but the code simply reuses one of the number which has proven to be above the upperLimit (and are potentially biased).However, as you describe, this edge case is not met when the
tries is sufficiently high. Not sure if this would have any influence on the distribution with a lower number of tries, or not.-
No bailing out of
for loop within ClampDigit() – Potentially your for loop resets result 10 (or bytes.Length times to be exact) before it continues and returns a result. By adding && result == -1 to the end condition you could bail out once you find a candidate. Could potentially reduce the running speed down to 1/10th of current speed.So in general your code looks good, and seems to give a better distribution than the default distribution when reducing the range significantly. But I'm a little unsure about the performance speed and cost if the range is increased, and why you've chosen to return the number as a string. All in all, interesting and well written code!
Context
StackExchange Code Review Q#160881, answer score: 5
Revisions (0)
No revisions yet.