patternMinor
Chernoff-like Concentration Bounds on Permutations
Viewed 0 times
concentrationlikeboundschernoffpermutations
Problem
Suppose I have $n$ balls. Among them, there are $m \leq n$ black balls and the other $n - m$ balls are white. Fix a random permutation $\pi$ over these balls and denote by $Y_i$ the number of black balls in the first $i$ positions of our permutation $\pi$. I would like to show that $Y_i$ is sharply concentrated around its mean.
The expected value
It is not hard to compute the expected value of $Y_i$: the probability that a black ball appears somewhere along the first $i$ positions is $i/n$; moreover, there are $m$ black balls, hence $$\mathbb{E}[Y_i] = \frac{m\cdot i}{n}.$$
But can we show that $Y_i$ is actually very close to its mean?
A failed attempt
For every black ball $b$, one can define an indicator variable $X_b$ which equals 1 if $b$ appears in the first $i$ positions of $\pi$ and 0 otherwise. Denoting the set of all black balls by $B$, we have
$$ Y_i = \sum_{b \in B} X_b.$$
I was hoping to be able to use a Chernoff bound type of a concentration bound, but unfortunately for two black balls $b$ and $b'$, $X_b$ and $X_{b'}$ are not completely independent. If $X_b = 1$, then there are only $i-1$ spots left for $b'$ to appear in the first $i$ locations and hence the probability of $X_{b'}$ being 1 gets smaller.
The expected value
It is not hard to compute the expected value of $Y_i$: the probability that a black ball appears somewhere along the first $i$ positions is $i/n$; moreover, there are $m$ black balls, hence $$\mathbb{E}[Y_i] = \frac{m\cdot i}{n}.$$
But can we show that $Y_i$ is actually very close to its mean?
A failed attempt
For every black ball $b$, one can define an indicator variable $X_b$ which equals 1 if $b$ appears in the first $i$ positions of $\pi$ and 0 otherwise. Denoting the set of all black balls by $B$, we have
$$ Y_i = \sum_{b \in B} X_b.$$
I was hoping to be able to use a Chernoff bound type of a concentration bound, but unfortunately for two black balls $b$ and $b'$, $X_b$ and $X_{b'}$ are not completely independent. If $X_b = 1$, then there are only $i-1$ spots left for $b'$ to appear in the first $i$ locations and hence the probability of $X_{b'}$ being 1 gets smaller.
Solution
Chernoff's bound applies to negatively correlated random variables, such as your hypergeometric distribution. You can find a full treatment in Dubhashi and Panconesi's very useful monograph Concentration of measure for the analysis of algorithms, as well as in many lecture notes, such as this one by Hariharan Ramesh.
Context
StackExchange Computer Science Q#96172, answer score: 4
Revisions (0)
No revisions yet.