HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Simulate a fair die with a biased die

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
biasedfairdiewithsimulate

Problem

Given a biased $N$-sided die, how can a random number in the range $[1,N]$ be generated uniformly? The probability distribution of the die faces is not known, all that is known is that each face has a nonzero probability and that the probability distribution is the same on all throws (in particular, the throws are independent). This is the obvious generalization of Fair results with unfair die.

Putting this in computer science terms, we have an oracle representing the die rolls: $D : \mathbb{N} \to [1,N]$ such that $p_i = P(D(k)=i)$ is nonzero and independent of $k$. We're looking for a deterministic algorithm $A$ which is parametrized by $D$ (i.e. $A$ may make calls to $D$) such that $P(A()=i) = 1/N$. The algorithm must terminate with probability 1, i.e. the probability that $A$ makes more than $n$ calls to $D$ must converge to $0$ as $n\to\infty$.

For $N=2$ (simulate a fair coin from coin flips with a biased coin), there is a well-known algorithm:

  • Repeat “flip twice” until the two throws come up with distinct outcomes ((heads, tails) or (tails, heads)). In other words, loop for $k = 0..\infty$ until $D(2k+1) \ne D(2k)$



  • Return 0 if the last pair of flips was (heads, tails) and 1 if it was (tails, heads). In other words, return $D(2k)$ where $k$ is the index at which the loop was terminated.



A simplistic way to make an unbiased die from a biased one is to use the coin flip unbiasing method to build a fair coin, and build a fair die with rejection sampling, as in Unbiasing of sequences. But is this optimal (for generic values of the probability distribution)?

Specifically, my question is: what is an algorithm that requires the smallest expected number of calls to the oracle? If the set of reachable expected values is open, what is the lower bound and what is a class of algorithms that converges towards this lower bound?

In case different families of algorithms are optimal for different probability distributions, let's focus on almost-fair dice: I'm looking

Solution

The following paper answers a close variant of this question: The Efficient Construction of an Unbiased Random Sequence, Elias 1972.

The question there seems to be this: Given access to this biased independent source, output a sequence of random numbers in $[1,N]$ (note the difference from your question in which only one output symbol is requested). As the length of the desired output goes to infinity, the "efficiency" of the scheme in the paper (which seems like a natural generalization of von Neumann) goes to $1$, meaning, I believe, that an input with entropy $h$ is converted to an output of entropy approaching $h$.

The question seems much better behaved when phrased this way, rather than requesting a single output digit, because, for instance, if we draw $N$ samples and end up with an output with lots of information (for instance, all $N$ input symbols are distinct), then we can use all of that information to produce many output symbols, whereas with the question as phrased here, any information beyond that used to produce one output symbol goes to waste.

I believe that the scheme repeatedly takes $N$ draws, looks at the sequence, and maps it some outputs or the empty string. Perhaps there is a way to improve the scheme for your question by looking at prefixes and stopping if we have "enough" information to output a symbol? I don't know.

Context

StackExchange Computer Science Q#29381, answer score: 3

Revisions (0)

No revisions yet.