patternMinor
The expectation of the total number of pairs of keys in a hash table that collide using universal hashing
Viewed 0 times
totalnumbertheuniversalcollidehashexpectationkeysthatusing
Problem
I am reading CLRS relating to perfect hashing. When computing the
$$
\mathbb{E}[\sum_{j=0}^{m-1}{n_j\choose{2}}]
$$
where $m$ is the number of slots in the hash table, and $n_j$ is the number of keys in position $j$. I don't understand why we can directly conclude that
$$
\mathbb{E}[\sum_{j=0}^{m-1}{n_j\choose{2}}]\leq{n\choose{2}}\frac{1}{m}
$$
I understand that since $h$ is randomly chosen from a universal hash function family, $\Pr{(h(x_i)=h(x_j))}\leq{\frac{1}{m}},\forall{i\neq{j}}$. I don't understand why we can use the total number of pairs (the combination part) directly because if $h(x_i)=h(x_j)$ and $h(x_j)=h(x_k)$, then we have $h(x_i)=h(x_k)$ immediately instead of a probability of $\frac{1}{m}$.
Someone can help me out? Thanks!
$$
\mathbb{E}[\sum_{j=0}^{m-1}{n_j\choose{2}}]
$$
where $m$ is the number of slots in the hash table, and $n_j$ is the number of keys in position $j$. I don't understand why we can directly conclude that
$$
\mathbb{E}[\sum_{j=0}^{m-1}{n_j\choose{2}}]\leq{n\choose{2}}\frac{1}{m}
$$
I understand that since $h$ is randomly chosen from a universal hash function family, $\Pr{(h(x_i)=h(x_j))}\leq{\frac{1}{m}},\forall{i\neq{j}}$. I don't understand why we can use the total number of pairs (the combination part) directly because if $h(x_i)=h(x_j)$ and $h(x_j)=h(x_k)$, then we have $h(x_i)=h(x_k)$ immediately instead of a probability of $\frac{1}{m}$.
Someone can help me out? Thanks!
Solution
I'm assuming that there is a (non-uniformly) random function $h\colon [n] \to [m]$, and $n_j$ is the number of preimages of $j$.
Notice that $\binom{n_j}{2} = \sum_{1 \leq a < b \leq n} 1_{h(a) = h(b) = j}$. We have
$$
\mathbb{E}\left[\binom{n_j}{2}\right] = \frac{1}{2} \sum_{a \neq b} \Pr[h(a) = h(b) = j] = \frac{1}{2} \sum_{a=1}^n \sum_{b \neq a} \Pr[h(a) = h(b) = j] \leq \frac{n-1}{2m} \sum_{a=1}^n \Pr[h(a) = j].
$$
Summing this over $j$, we get
$$
\sum_{j=1}^m \mathbb{E}\left[\binom{n_j}{2}\right] \leq \frac{n-1}{2m} \sum_{a=1}^n \sum_{j=1}^m \Pr[h(a) = j] = \frac{n(n-1)}{2m} = \frac{1}{m} \binom{n}{2}.
$$
Notice that $\binom{n_j}{2} = \sum_{1 \leq a < b \leq n} 1_{h(a) = h(b) = j}$. We have
$$
\mathbb{E}\left[\binom{n_j}{2}\right] = \frac{1}{2} \sum_{a \neq b} \Pr[h(a) = h(b) = j] = \frac{1}{2} \sum_{a=1}^n \sum_{b \neq a} \Pr[h(a) = h(b) = j] \leq \frac{n-1}{2m} \sum_{a=1}^n \Pr[h(a) = j].
$$
Summing this over $j$, we get
$$
\sum_{j=1}^m \mathbb{E}\left[\binom{n_j}{2}\right] \leq \frac{n-1}{2m} \sum_{a=1}^n \sum_{j=1}^m \Pr[h(a) = j] = \frac{n(n-1)}{2m} = \frac{1}{m} \binom{n}{2}.
$$
Context
StackExchange Computer Science Q#107314, answer score: 2
Revisions (0)
No revisions yet.