HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Random bits with a specified density (Bernoulli Trials)

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
randomdensitywithbitstrialsbernoullispecified

Problem

I'm sure this is a fully 'solved' problem and expecting cross references but I can't find this on Google. I'm probably missing a keyword.

Here is a naive function for populating a bool array (b[i]) of length N in a Bernoulli distribution.
P(b[i]==true)==p and P(b[i]==false)==(1-p) for each independent random variable b[i].

#include 
#include 
#include 

void randomize(bool*const bits,const size_t len,double p){

    for(size_t i=0;i=1.0||rand()=1.0 as 0 and RAND_MAX possible.
            bits[i]=true;
        }else{
            bits[i]=false;
        }
    }
}

int main(void) {

    const size_t len=100;//N in the text.
    bool* bits=malloc(len*sizeof(*bits));//b[i] in the text.
    if(bits==NULL){
        return EXIT_FAILURE;
    }

    srand(1234);//Fixed for Reproducibility.

    randomize(bits,len,0.3);

    int count=0;
    for(size_t i=0;i<len;++i){
        printf("%c" , (bits[i]?'1':'0'));
        if(bits[i]){
            ++count;
        }
    }
    printf("\ncount=%f\n",((double)count)/((double)len));

    free(bits);

    return EXIT_SUCCESS;
}


Obviously it's linear \$\mathcal{O}(N)\$ for 'seeks' in the array.

However I could start with an array full of (0) and calculate a random variable (\$B\$) in the binomial distribution to get the number of bits to set then select them at random.

If \$p\$ is small that will involve far fewer seeks. It will still be (expected) asymptotically linear but with a far reduced constant.

If I naively pick the \$B\$ elements to set I might pick the same one twice (Birthday Paradox anyone?) and will need to pick again. If I just 'scan' forwards to the first free slot I will introduce a 'clumping' bias and violate the independence of the random variables P(b[i+1]==1|b[i]==1)>p. Clumping is a no-no for my application.

Therein lies the rub. If \$p\$ is near to 1.0 it will take an increasingly massive amount of time to fill the array. When it's nearly full randomly picking a 0 will typically take a lot

Solution

For what I can see, the code fills an array with a given amounts of true values (the rest being false) in random order. I'd rather initialize first p*N elements to true, and then random_shuffle the array.

That said I don't understand why you need this array at all. Can we see how your application uses it?

Context

StackExchange Code Review Q#75651, answer score: 2

Revisions (0)

No revisions yet.