HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Generating bounded & unique random numbers in the Linux kernel

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
randomuniquethenumbersgeneratinglinuxboundedkernel

Problem

I recently needed to generate a series of unique random numbers (non-repeated) within a bounded range inside of the Linux kernel. The code I came up with is below. I'd appreciate any feedback -- again, please keep in mind that it's within the Linux kernel, so the standard C library is out of the question.

```
#include
#include
#include

static int mod(int, int);
static int _is_repeated_int(int *, int, int);

/* proj2_create_randints - creates a buffer of random integers
* @dstruct_size: number of random integers
* @max_range: maximum range of values to generate
*
* Allocates a buffer that can hold @dstruct_size random integers. If
* @max_range is greater than 0, mods all values by @max_value to ensure
* they fall within the range. If @dstruct_size is greater than @max_range,
* returns NULL. The buffer returned is guaranteed to have no duplicate values.
*
* On successs, it fills the buffer and returns a pointer to the buffer;
* otherwise returns NULL.
*/
int *proj2_create_randints(int dstruct_size, int max_range)
{
int *randint;
int tmp, idx;

if ((max_range != 0) && (dstruct_size > max_range))
return NULL;

randint = kmalloc(dstruct_size sizeof(randint), GFP_KERNEL);
if (randint != NULL) {
idx = 0;
while (idx 0)
tmp = mod(tmp, max_range);
/ Check if 'tmp' is already in array /
if (!_is_repeated_int(randint, idx, tmp)) {
randint[idx] = tmp;
idx++;
}
}

}
return randint;
}

/* mod - returns x modulo m
* @x: value to reduce
* @m: modulo value
*
* Guaranteed to be positive.
*/
static int mod(int x, int m)
{
return (x%m + m)%m;
}

/* _is_repeated_int - checks array against repeated values
* @array: array of integers to check
* @size: size of @array
* @new_value: value to check array for
*/
static int _is_repeated_int(int *array, int size, int new_value)
{
int i;

if (size == 0)

Solution

What are the valid input ranges? The documentation flags two special cases and the code handles them as special cases, but

  • I don't see any checks for dstruct_size



  • If max_range is very large, the implementation of mod is incorrect. Consider e.g. mod(1



On the assumption that the intention is to guarantee that max_range will be zero or positive, why int instead of uint? Using uint would encode the guarantee of non-negative return values, and would remove the need for the mod helper.

/* Check if 'tmp' is already in array */
            if (!_is_repeated_int(randint, idx, tmp)) {
                randint[idx] = tmp;
                idx++;
            }


is very inefficient. Consider the worst case: dstruct_size == max_range. The expected number of calls to get_random_int will be quadratic in max_range, and the overall running time of the method will be cubic.

A simple improvement which requires only dstruct_size calls to get_random_int and overall quadratic time is the following (pseudocode):

for (idx = 0; idx = sorted(randint)[cmpIdx]) tmp++;
    }
    randint[idx] = tmp;
}


Obviously it needs minor amendment to handle the case max_range = 0, but that's not too difficult, especially if working with uint.

The big issue is that, as JS1 pointed out in comments, the loop which increments to avoid collisions needs to process the previous numbers in ascending order. There are two ways to do this: either keep a second, temporary, buffer with the numbers sorted; or sort randint as we go and then Fisher-Yates shuffle at the end. One uses twice as much memory; the other uses twice as many calls to get_random_int(). Both use insertion sort, and leave the overall running time as quadratic.

The Fisher-Yates option, however, does allow a fairly tricky elegant insertion.

for (i = 0; i = randint[j]) tmp++;
        else {
            tmp2 = tmp;
            tmp = randint[j];
            randint[j] = tmp2;
        }
    }
    randint[i] = tmp;
}
// TODO Now Fisher-Yates shuffle


Observe that because randint contains sorted distinct values, once we find the insertion point for tmp we will never again hit the first case. In fact, that means we could optimise the inner loop and assignment to randint[i] as

for (j = 0; j = randint[j]; j++, tmp++) {
        // Empty loop body
    }
    for (; j <= i; j++) {
        tmp2 = tmp;
        tmp = randint[j];
        randint[j] = tmp2;
    }


I've followed your code in taking get_random_int() modulo a value, and that's ok if you don't need uniformity. However, in some cases the fact that get_random_int() % m is more likely to be 0 than m-1 (unless m is a power of two) is a problem. In that case you'd want an auxiliary get_uniform_random_int(m).

Code Snippets

/* Check if 'tmp' is already in array */
            if (!_is_repeated_int(randint, idx, tmp)) {
                randint[idx] = tmp;
                idx++;
            }
for (idx = 0; idx < dstruct_size; idx++) {
    tmp = (uint)get_random_int() % (max_range - idx);
    for (cmpIdx = 0; cmpIdx < idx; cmpIdx++) {
        // Discussed below
        if (tmp >= sorted(randint)[cmpIdx]) tmp++;
    }
    randint[idx] = tmp;
}
for (i = 0; i < dstruct_size; i++) {
    tmp = (uint)get_random_int() % (max_range - i);
    for (j = 0; j < i; j++) {
        if (tmp >= randint[j]) tmp++;
        else {
            tmp2 = tmp;
            tmp = randint[j];
            randint[j] = tmp2;
        }
    }
    randint[i] = tmp;
}
// TODO Now Fisher-Yates shuffle
for (j = 0; j < i && tmp >= randint[j]; j++, tmp++) {
        // Empty loop body
    }
    for (; j <= i; j++) {
        tmp2 = tmp;
        tmp = randint[j];
        randint[j] = tmp2;
    }

Context

StackExchange Code Review Q#154974, answer score: 5

Revisions (0)

No revisions yet.