HiveBrain v1.2.0
Get Started
← Back to all entries
snippetcsharpCritical

Generate random numbers without repetitions

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
randomwithoutnumbersrepetitionsgenerate

Problem

I want to generate a list of N different random numbers:

public static List GetRandomNumbers(int count)
{
    List randomNumbers = new List(); 

    for (int i=0; i<count; i++) 
    {    
        int number;

        do number = random.Next();
        while (randomNumbers.Contains(number));

        randomNumbers.Add(number);
    }

    return randomNumbers;
}


But I feel there is a better way. This do...while loop especially makes this ugly. Any suggestions on how to improve this?

Solution

Updated answer in response to bounty: See Is that your final answer? at the end, and other changes - basically answer is significantly rewritten.

To break your problem down in to requirements:

  • you need a set of random numbers



  • the numbers need to be unique



  • the order of the returned numbers needs to be random



Your current code indicates that the range of random numbers is specified by Random.Next(), which returns values in the [0 .. Int32.MaxValue) range (note, it excludes Int32.MaxValue). This is significant for the purpose of this question, because other answers have assumed that the range is configurable, and 'small'.

If the range should be configurable, then the recommended algorithm would potentially be much larger.

Based on those assumptions, let's do a code review...

Code Style

do ... while

The most glaring problems here are the un-braced do-while loop. You already knew it, but this code is ugly:

do number = random.Next();
    while (randomNumbers.Contains(number));


It really should be braced:

do
    {
        number = random.Next();
    } while (randomNumbers.Contains(number));


This makes the statement clear, and significantly reduces confusion. Always use braces for 1-liners.

List Construction

The List class allows an initial capacity to be used. Since the capacity needs to just be count, it makes sense to initialize the list with this capacity:

List randomNumbers = new List(count);


Current Algorithm

This is where the most interesting observations can be made. Let's analyze your current algorithm:

  • Create a container for the results



  • repeat until you have selected N values:



  • Select a random value



  • check if it has been previously selected



  • if it is 'new', then add it to the container



This algorithm will produce random values, in a random order, and with good random characteristics (no skews, biases, gaps, etc.).

In other words, your results are good.

The problem is with performance....

There are two performance concerns here, one small, the other large:

  • the do-while loop to avoid collisions



  • the List container



do-while performance

The do-while has a very low impact on performance... almost negligible. This is hotly debated, but, the reality is that you would need a very, very large count before this becomes a problem. The reasoning is as follows:

Collisions happen when the random value was previously selected. For the specified range of [0 .. Int32.MaxValue), you would need a very large count before collisions actually happened. For example, count would have to be about 65,000 before there was better than a 50% chance that there was even a single collision.

In a general sense, given a Range of \$N\$, select \$M\$ numbers. If \$M

  • Data will need to be in both the HashSet and the List at some point, so the space usage is doubled.



  • The shuffle at the end is needed to keep the data in a random order (HashSet does not keep the data in any specific order, and the hashing algorithm will cause the order to become biased, and skewed).



These are only performance issues when the count is very large. Note that only the collisions at large count will impact the scalability of the solution (at large count it is no longer quite \$O(n)\$, and it will be come progressively worse when count approaches Int32.MaxValue. Note that in real life this will not likely ever happen.... and memory will become a problem before performance does.

@JerryCoffin pointed to an alternate algorithm from Bob Floyd, where a trick is played to ensure that collisions never happen. This solves the problem of scalability at very large counts. It does not solve the need for both a HashSet and a List, and it does not solve the need for the shuffle.

The algorithm as presented:

initialize set S to empty
for J := N-M + 1 to N do
    T := RandInt(1, J)
    if T is not in S then
        insert T in S
    else
        insert J in S


assumes that RandInt(1, J) returns values inclusive of J.

To understand the above algorithm, you need to realize that you choose a random value from a range that is smaller than the full range, and then after each value, you extend that to include one more. In the event of a collision, you can safely insert the max because it was never possible to include it before.

The chances of a collision increase at the same rate that the number of values decreases, so the probability of any one number being in the result is not skewed, or biased.

Is this almost a final answer? No

So, using the above solution, in C#, would look like (in Ideone) (note, code is now different to Ideone):

```
public static List GenerateRandom(int count)
{
// generate count random values.
HashSet candidates = new HashSet();
for (Int32 top = Int32.MaxValue - count; top result = candidates.ToList();

// shuffle the results:
int i = result.Count;
while (i > 1)
{
i--;
int k = random.Next(i + 1);

Code Snippets

do number = random.Next();
    while (randomNumbers.Contains(number));
do
    {
        number = random.Next();
    } while (randomNumbers.Contains(number));
List<int> randomNumbers = new List<int>(count);
using System;
using System.Collections.Generic;

public class Test
{
    static Random random = new Random();

    public static List<int> GenerateRandom(int count)
    {
        // generate count random values.
        HashSet<int> candidates = new HashSet<int>();
        while (candidates.Count < count)
        {
            // May strike a duplicate.
            candidates.Add(random.Next());
        }

        // load them in to a list.
        List<int> result = new List<int>();
        result.AddRange(candidates);

        // shuffle the results:
        int i = result.Count;  
        while (i > 1)
        {  
            i--;  
            int k = random.Next(i + 1);  
            int value = result[k];  
            result[k] = result[i];  
            result[i] = value;  
        }  
        return result;
    }
    public static void Main()
    {
        List<int> vals = GenerateRandom(10);
        Console.WriteLine("Result: " + vals.Count);
        vals.ForEach(Console.WriteLine);
    }
}
initialize set S to empty
for J := N-M + 1 to N do
    T := RandInt(1, J)
    if T is not in S then
        insert T in S
    else
        insert J in S

Context

StackExchange Code Review Q#61338, answer score: 106

Revisions (0)

No revisions yet.