HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Knuth's proof of O(1) for linear probing

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
probinglinearforknuthproof

Problem

I'm currently reading Knuth's proof about O(1) number of probes in
linear probing. I have a small question on the page 536 (Volume 3, 2nd Edition). Knuth says


Let $f(M, N)$ be the number of hash sequences such that position 0 of the table will be empty after the keys have been inserted using linear probing.
The circular symmetry of linear probing implies that position 0 is empty just as often as any other position, so it is empty with probability $1 - \frac{N}{M}$; in other words $f(M, N) = (1 - \frac{N}{M})M^N$.

I explain to myself as follows.
Each hash sequence has $M-N$ empty positions, then the total number of
empty positions is $(M-N)M^N$ and -- due to the symmetry -- each position is empty for $(M-N)M^N / M = (1 - \frac{N}{M})M^N$ hash sequences.
Thus, the probability of zero position to be empty is $(1 - \frac{N}{M})$.

The problem is that Knuth first states that the probability that 0 position is empty is $(1 - \frac{N}{M})$ and after says that $f(M, N) = (1 - \frac{N}{M})M^N$. I'm able to prove this only in other way around.
Could please anybody explain how to prove the probability without showing
that the total amount of empty positions is $(M-N)M^N$.

Thank you.

Solution

Let $p_i$ be the probability that position $i$ is empty. A simple coupling (detailed below) shows that $p_i = p_j$ for all $i,j$, and so $Mp_0 = p_0 + \cdots + p_{M-1}$. Now let $X_i$ be the indicator for the event "$i$ is empty". Then $$p_0 + \cdots + p_{M-1} = \mathbb{E}[X_0] + \cdots + \mathbb{E}[X_{M-1}] = \mathbb{E}[X_0 + \cdots + X_{M-1}], $$
using linearity of expectation. But $X_0 + \cdots + X_{M-1} = M-N$, since exactly $N$ positions are full and $M-N$ are empty. Altogether, we get
$$
p_0 = \frac{p_0 + \cdots + p_{M-1}}{M} = \frac{M-N}{M} = 1 - \frac{N}{M}.
$$

Now for the coupling argument. Let $A = a_0,\ldots,a_{N-1} \in \{0,\ldots,M-1\}^N$ be a uniformly random insertion sequence, and let $B = b_0,\ldots,b_{N-1}$, where $b_t = a_t + j-i \pmod{M}$. Denote the final resulting hashtables by $H_A,H_B$. You can prove by induction that $H_A(p) = H_B(p+j-i\bmod{m})$ for every $p \in \{0,\ldots,m-1\}$ (the induction is on the number of insertions), and so $H_B(j) = H_A(i)$. In particular, cell $i$ is empty in $H_A$ iff cell $j$ is empty in $H_B$. Now $H_A$ and $H_B$, when considered individually, are generated by a uniformly random insertion sequence, and so the probability that $H_A(i)$ is empty is $p_i$, and the probability that $H_B(j)$ is empty is $p_j$. This shows that $p_i = p_j$.

Context

StackExchange Computer Science Q#81030, answer score: 4

Revisions (0)

No revisions yet.