HiveBrain v1.2.0
Get Started
← Back to all entries
debugrubyMinor

Generating non-repeating random integers in a fixed range

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
randomnonrangerepeatinggeneratingfixedintegers

Problem

I was working on the "Robot Name" exercise over on exercism.io (neat site, by the way), the gist of which is: Generate random, unique strings that match /[A-Za-z]{2}[0-9]{3}/, e.g. "xY123", "Gv020", and so on. So there are 2,704,000 possible names out there.

Now, the exercise's tests do not do a full-on check for uniqueness or randomness, but making the names random and collision-free is something you're encouraged to try yourself. So I wanted to try that.

One option is of course to generate each and every name sequentially and shuffle them. But that seems pretty inelegant.

So went looking for something cooler and came across this neat method for generating random, non-repeating 32 bit integers using "quadratic prime residue" (which sounds like something you'd find on a mathematician's shoes).

That method is really neat, but it makes use of C's strict 32-bit ints, and some XOR magic that I can't use for the smaller range I'm dealing with here.

So I modified it a little bit, and came up with what's below. I did a quick and dirty test, printing all the names to a file, and running checking for uniqueness thusly:

$ sort -u names.txt | wc -l


and indeed it claims that there are 2,704,000 names after sorting and removing duplicates. So yay for that. Distribution is probably skewed all sorts of ways, but it appears to work, and at least it's not sequential.

But I'm wondering if there's a simpler way that I'm just missing here. Or, failing that, if there are improvements that could be made to the thing below.

Note that the code is very much fixed to the 0..2,704,000 range. This is on purpose (hence the hard-coded prime number etc.). I'm not really interested in tweaking this particular code to make it more flexible in that regard. But if someone has an already-flexible approach that could replace it, that'd be neat.

```
require "singleton"

class RobotNameGenerator
include Singleton

# 2x fifty-two letters (A-Z and a-z), 3 digits = 2,704,000 combin

Solution

Any simple generator whose output is equivalent to its internal state will generate a non-repeating sequence. In fact, most will generate a permutation of the range [0, P) where P is the length of the period. Or [1, P), if state == 0 is forbidden.

Pick any simple generator with period P ≥ 2,704,000 and Bob's your uncle.

The simplest is the Golden Weyl generator (the 'Golden' is my moniker for it):

x = (x + 0x9E3779B9u) mod 2^32


The net has a wide choice of suitable Linear Congruential Generators (LCGs) of the form

x = (a * x + c) mod 2^32


Julienne Walker has written a brilliant explanation of LCGs, and the Weyl generators are just a special case with multiplier 1.

However, regardless of which type of generator you choose, its period P is likely to be too long.

Extracting a shorter non-repeating sequence of values [0,M) from a longer sequence of non-repeating values [0,P) is not trivial. In particular, if you map the [0,P) values to the [0,M) range in some fashion - e.g. by scaling, or by computing the remainder modulo M - then you lose the property of non-repetition. Stuffing P values into a smaller number M of pigeon holes means that some values must end up in the same hole.

One strategy that works well is rejection: inspect the generator output and if you don't like the value for some reason, throw it away and call the generator again. This method is perfect for generating even the most outlandish output distributions, in the sense that it does not introduce any bias at all and it leaves properties like non-repetition intact. The downside is that you have to do more work if a lot of stuff gets thrown away.

The trick, then, is to pick a non-repeating generator whose period length is as close to M = 2,704,000 as possible, and then use rejection with that.

LCGs modulo powers of 2 have the special property that any lower k bits cycle with period 2^k. I.e., the lowest bit is strictly alternating, the low 2 bits cycle with period 4, and so on. This means that these generators can be reduced modulo any power of 2 without losing the property of non-repetition. In other words, any LCG modulo 2^x is also a generator modulo 2^k for all k that are less than x. These subsequences can be extracted by throwing away (masking off) the unwanted higher bits.

The first power of two that is not less than the modulus M = 2,704,000 is 2^22. Masking with 2^22 - 1 = 0x3FFFFF we only have to reject the 35% of outputs that are in the range [M,2^22). This means the generator will have to be cycled 1.55 times on average for each good value in the range [0,M). That's less than two MULs for a non-repeating random value with an off-beat modulus.

Simple and effective.

An example for a suitable generator of higher quality (where the lower bits do not cycle with their own lower periods) is a 22-bit Tausworthe. Think of it as a single barrel of the four-barrelled lfsr113 that powers games like Dragon Age, whose long-awaited newest chapter is due to be released in a few hours.

x = (((x ^ (x > (22 - 17)) ^ (x << 17);

return (x & 0x3FFFFF) - 1;


This generator has period 2^22 - 1 because 0 is a forbidden (sticky) state, hence the decrement after masking the output. The rejection method is then essentially the same as above.

I wrote (22 - 17) because the 17 corresponds to parameter s in the usual descriptions of the algorithm (and 22 equals k). Other possible choices for s are here: 1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 19 and 20. The algorithm is usually described in a way that makes the 'live' k bits sit at the upper end of the larger x:

x = ((x ^ (x > (k - s)) ^ ((x & m) << s);


I took the liberty of shifting the whole shebang down to the lower end, so that the live bits can be extracted with simple masking. The output sequence is the same.

Code Snippets

x = (x + 0x9E3779B9u) mod 2^32
x = (a * x + c) mod 2^32
x = (((x ^ (x << 1)) & 0x3FFFFF) >> (22 - 17)) ^ (x << 17);

return (x & 0x3FFFFF) - 1;
x = ((x ^ (x << q)) >> (k - s)) ^ ((x & m) << s);

Context

StackExchange Code Review Q#69881, answer score: 4

Revisions (0)

No revisions yet.