patterncMinor
Generating bounded & unique random numbers in the Linux kernel
Viewed 0 times
randomuniquethenumbersgeneratinglinuxboundedkernel
Problem
I recently needed to generate a series of unique random numbers (non-repeated) within a bounded range inside of the Linux kernel. The code I came up with is below. I'd appreciate any feedback -- again, please keep in mind that it's within the Linux kernel, so the standard C library is out of the question.
```
#include
#include
#include
static int mod(int, int);
static int _is_repeated_int(int *, int, int);
/* proj2_create_randints - creates a buffer of random integers
* @dstruct_size: number of random integers
* @max_range: maximum range of values to generate
*
* Allocates a buffer that can hold @dstruct_size random integers. If
* @max_range is greater than 0, mods all values by @max_value to ensure
* they fall within the range. If @dstruct_size is greater than @max_range,
* returns NULL. The buffer returned is guaranteed to have no duplicate values.
*
* On successs, it fills the buffer and returns a pointer to the buffer;
* otherwise returns NULL.
*/
int *proj2_create_randints(int dstruct_size, int max_range)
{
int *randint;
int tmp, idx;
if ((max_range != 0) && (dstruct_size > max_range))
return NULL;
randint = kmalloc(dstruct_size sizeof(randint), GFP_KERNEL);
if (randint != NULL) {
idx = 0;
while (idx 0)
tmp = mod(tmp, max_range);
/ Check if 'tmp' is already in array /
if (!_is_repeated_int(randint, idx, tmp)) {
randint[idx] = tmp;
idx++;
}
}
}
return randint;
}
/* mod - returns x modulo m
* @x: value to reduce
* @m: modulo value
*
* Guaranteed to be positive.
*/
static int mod(int x, int m)
{
return (x%m + m)%m;
}
/* _is_repeated_int - checks array against repeated values
* @array: array of integers to check
* @size: size of @array
* @new_value: value to check array for
*/
static int _is_repeated_int(int *array, int size, int new_value)
{
int i;
if (size == 0)
```
#include
#include
#include
static int mod(int, int);
static int _is_repeated_int(int *, int, int);
/* proj2_create_randints - creates a buffer of random integers
* @dstruct_size: number of random integers
* @max_range: maximum range of values to generate
*
* Allocates a buffer that can hold @dstruct_size random integers. If
* @max_range is greater than 0, mods all values by @max_value to ensure
* they fall within the range. If @dstruct_size is greater than @max_range,
* returns NULL. The buffer returned is guaranteed to have no duplicate values.
*
* On successs, it fills the buffer and returns a pointer to the buffer;
* otherwise returns NULL.
*/
int *proj2_create_randints(int dstruct_size, int max_range)
{
int *randint;
int tmp, idx;
if ((max_range != 0) && (dstruct_size > max_range))
return NULL;
randint = kmalloc(dstruct_size sizeof(randint), GFP_KERNEL);
if (randint != NULL) {
idx = 0;
while (idx 0)
tmp = mod(tmp, max_range);
/ Check if 'tmp' is already in array /
if (!_is_repeated_int(randint, idx, tmp)) {
randint[idx] = tmp;
idx++;
}
}
}
return randint;
}
/* mod - returns x modulo m
* @x: value to reduce
* @m: modulo value
*
* Guaranteed to be positive.
*/
static int mod(int x, int m)
{
return (x%m + m)%m;
}
/* _is_repeated_int - checks array against repeated values
* @array: array of integers to check
* @size: size of @array
* @new_value: value to check array for
*/
static int _is_repeated_int(int *array, int size, int new_value)
{
int i;
if (size == 0)
Solution
What are the valid input ranges? The documentation flags two special cases and the code handles them as special cases, but
On the assumption that the intention is to guarantee that
is very inefficient. Consider the worst case:
A simple improvement which requires only
Obviously it needs minor amendment to handle the case
The big issue is that, as JS1 pointed out in comments, the loop which increments to avoid collisions needs to process the previous numbers in ascending order. There are two ways to do this: either keep a second, temporary, buffer with the numbers sorted; or sort
The Fisher-Yates option, however, does allow a fairly tricky elegant insertion.
Observe that because
I've followed your code in taking
- I don't see any checks for
dstruct_size
- If max_range
is very large, the implementation ofmodis incorrect. Consider e.g.mod(1
On the assumption that the intention is to guarantee that
max_range will be zero or positive, why int instead of uint? Using uint would encode the guarantee of non-negative return values, and would remove the need for the mod helper./* Check if 'tmp' is already in array */
if (!_is_repeated_int(randint, idx, tmp)) {
randint[idx] = tmp;
idx++;
}is very inefficient. Consider the worst case:
dstruct_size == max_range. The expected number of calls to get_random_int will be quadratic in max_range, and the overall running time of the method will be cubic.A simple improvement which requires only
dstruct_size calls to get_random_int and overall quadratic time is the following (pseudocode):for (idx = 0; idx = sorted(randint)[cmpIdx]) tmp++;
}
randint[idx] = tmp;
}Obviously it needs minor amendment to handle the case
max_range = 0, but that's not too difficult, especially if working with uint.The big issue is that, as JS1 pointed out in comments, the loop which increments to avoid collisions needs to process the previous numbers in ascending order. There are two ways to do this: either keep a second, temporary, buffer with the numbers sorted; or sort
randint as we go and then Fisher-Yates shuffle at the end. One uses twice as much memory; the other uses twice as many calls to get_random_int(). Both use insertion sort, and leave the overall running time as quadratic.The Fisher-Yates option, however, does allow a fairly tricky elegant insertion.
for (i = 0; i = randint[j]) tmp++;
else {
tmp2 = tmp;
tmp = randint[j];
randint[j] = tmp2;
}
}
randint[i] = tmp;
}
// TODO Now Fisher-Yates shuffleObserve that because
randint contains sorted distinct values, once we find the insertion point for tmp we will never again hit the first case. In fact, that means we could optimise the inner loop and assignment to randint[i] asfor (j = 0; j = randint[j]; j++, tmp++) {
// Empty loop body
}
for (; j <= i; j++) {
tmp2 = tmp;
tmp = randint[j];
randint[j] = tmp2;
}I've followed your code in taking
get_random_int() modulo a value, and that's ok if you don't need uniformity. However, in some cases the fact that get_random_int() % m is more likely to be 0 than m-1 (unless m is a power of two) is a problem. In that case you'd want an auxiliary get_uniform_random_int(m).Code Snippets
/* Check if 'tmp' is already in array */
if (!_is_repeated_int(randint, idx, tmp)) {
randint[idx] = tmp;
idx++;
}for (idx = 0; idx < dstruct_size; idx++) {
tmp = (uint)get_random_int() % (max_range - idx);
for (cmpIdx = 0; cmpIdx < idx; cmpIdx++) {
// Discussed below
if (tmp >= sorted(randint)[cmpIdx]) tmp++;
}
randint[idx] = tmp;
}for (i = 0; i < dstruct_size; i++) {
tmp = (uint)get_random_int() % (max_range - i);
for (j = 0; j < i; j++) {
if (tmp >= randint[j]) tmp++;
else {
tmp2 = tmp;
tmp = randint[j];
randint[j] = tmp2;
}
}
randint[i] = tmp;
}
// TODO Now Fisher-Yates shufflefor (j = 0; j < i && tmp >= randint[j]; j++, tmp++) {
// Empty loop body
}
for (; j <= i; j++) {
tmp2 = tmp;
tmp = randint[j];
randint[j] = tmp2;
}Context
StackExchange Code Review Q#154974, answer score: 5
Revisions (0)
No revisions yet.