patterncppCritical
Why do people say there is modulo bias when using a random number generator?
Viewed 0 times
biasrandomwhyusingpeoplesaywhennumbergeneratorthere
Problem
Why exactly is there "modulo bias" when using a random number generator, like
rand() in C++?Solution
So
Now what happens if you want to generate a random number between say 0 and 2? For the sake of explanation, let's say
When
When
When
This does not generate the numbers between 0 and 2 with equal probability. Of course for small ranges this might not be the biggest issue but for a larger range this could skew the distribution, biasing the smaller numbers.
So when does
So how do we solve this problem? A crude way is to keep generating random numbers until you get a number in your desired range:
but that's inefficient for low values of
A more efficient formula approach would be to take some large range with a length divisible by
For small values of
Works cited and further reading:
-
CPlusPlus Reference
-
Eternally Confuzzled
rand() is a pseudo-random number generator which chooses a natural number between 0 and RAND_MAX, which is a constant defined in cstdlib (see this article for a general overview on rand()).Now what happens if you want to generate a random number between say 0 and 2? For the sake of explanation, let's say
RAND_MAX is 10 and I decide to generate a random number between 0 and 2 by calling rand()%3. However, rand()%3 does not produce the numbers between 0 and 2 with equal probability! When
rand() returns 0, 3, 6, or 9, rand()%3 == 0. Therefore, P(0) = 4/11When
rand() returns 1, 4, 7, or 10, rand()%3 == 1. Therefore, P(1) = 4/11 When
rand() returns 2, 5, or 8, rand()%3 == 2. Therefore, P(2) = 3/11This does not generate the numbers between 0 and 2 with equal probability. Of course for small ranges this might not be the biggest issue but for a larger range this could skew the distribution, biasing the smaller numbers.
So when does
rand()%n return a range of numbers from 0 to n-1 with equal probability? When RAND_MAX%n == n - 1. In this case, along with our earlier assumption rand() does return a number between 0 and RAND_MAX with equal probability, the modulo classes of n would also be equally distributed.So how do we solve this problem? A crude way is to keep generating random numbers until you get a number in your desired range:
int x;
do {
x = rand();
} while (x >= n);but that's inefficient for low values of
n, since you only have a n/RAND_MAX chance of getting a value in your range, and so you'll need to perform RAND_MAX/n calls to rand() on average.A more efficient formula approach would be to take some large range with a length divisible by
n, like RAND_MAX - RAND_MAX % n, keep generating random numbers until you get one that lies in the range, and then take the modulus:int x;
do {
x = rand();
} while (x >= (RAND_MAX - RAND_MAX % n));
x %= n;For small values of
n, this will rarely require more than one call to rand().Works cited and further reading:
-
CPlusPlus Reference
-
Eternally Confuzzled
Code Snippets
int x;
do {
x = rand();
} while (x >= n);int x;
do {
x = rand();
} while (x >= (RAND_MAX - RAND_MAX % n));
x %= n;Context
Stack Overflow Q#10984974, score: 473
Revisions (0)
No revisions yet.