patternMinor
Chernoff-Hoeffding bounds for the number of nonzeros in a submatrix
Viewed 0 times
nonzerosthenumberhoeffdingsubmatrixforboundschernoff
Problem
Consider a $n \times n$ matrix $A$ with $k$ nonzero entries. Assume every row and every column of $A$ has at most $\sqrt{k}$ nonzeros. Permute uniformly at random the rows and the columns of $A$. Divide $A$ in $k$ submatrices of size $n/\sqrt{k} \times n/\sqrt{k}$ (i.e. $\sqrt{k}$ meta-rows and meta-columns). Enumerate the $k$ nonzeros and define the following indicator random variable:
\begin{equation}
X_{\ell,z} =
\begin{cases}
1 & \text{if the $z$-th nonzero entry is in $A^\ell$} \\
0 & \text{otherwise}
\end{cases}
\end{equation}
The expected number of nonzero entries in a generic submatrix $A^\ell$ is one. Is it possible to prove Chernoff-Hoeffding bounds on the sum $X_\ell = \sum_{z=1}^k X_{\ell,z}$?
My first guess was to prove negative association, following Dubhashi and Panconesi analysis. Unfortunately, $X_{\ell,z}$ and $X_{\ell,z^\prime}$ are not negatively associated (following the book's notation, if $z$ and $z^\prime$ are in the same row/column then $\mathbf{E}[f(X_{\ell,z})g(X_{\ell,z^\prime})] > \mathbf{E}[f(X_{\ell,z})] \mathbf{E}[g(X_{\ell,z^\prime})]$).
\begin{equation}
X_{\ell,z} =
\begin{cases}
1 & \text{if the $z$-th nonzero entry is in $A^\ell$} \\
0 & \text{otherwise}
\end{cases}
\end{equation}
The expected number of nonzero entries in a generic submatrix $A^\ell$ is one. Is it possible to prove Chernoff-Hoeffding bounds on the sum $X_\ell = \sum_{z=1}^k X_{\ell,z}$?
My first guess was to prove negative association, following Dubhashi and Panconesi analysis. Unfortunately, $X_{\ell,z}$ and $X_{\ell,z^\prime}$ are not negatively associated (following the book's notation, if $z$ and $z^\prime$ are in the same row/column then $\mathbf{E}[f(X_{\ell,z})g(X_{\ell,z^\prime})] > \mathbf{E}[f(X_{\ell,z})] \mathbf{E}[g(X_{\ell,z^\prime})]$).
Solution
Okay, here is a full answer.
We will use the fact, that any bipartite graph of maximum degree $d$ can be broken into (at most) $d$ matchings.
In our case, this means that we can split $A$ into (at most) $\sqrt{k}$ disjoint sets of elements $S_i\subseteq\{(i,j)\in[n]\times[n] \mid A_{i,j}=1\}$ of size at most $n$ such that any every element in each set has a unique row and column.
(This uses that a $1$ in $A$ can be seen as an edge in the $n\times n$ bipartite graph that is the matrix.)
Now let $X$ be the total number of $1$s in your $n/\sqrt k\times n/\sqrt k$ matrix, $A^\ell$, and let $X_i$ be the number of elements coming from $S_i$.
Then $X\ge t\implies \exists_i\, X_i\ge t/\sqrt k$ and so
$$\Pr[X\ge t]\le \sqrt k \Pr[X_1 \ge t/\sqrt k]$$
by the union bound.
Now, because elements in $S_1$ don't share any rows/columns, it is easy to analyze using your original approach.
In particular, reorder the rows and columns using the matching, so that the first element in $S_1$ is $(1,1)$ the next is $(2,2)$ and so on.
Let $r_i$ be the random variable that is 1 if row $i$ is chosen and $c_i$ likewise. Then we want to upper bound $\Pr[\sum_i r_ic_i \ge t]$.
Notice $E[r_ic_ir_jc_j] = E[r_ir_j]E[c_ic_j] \le E[r_i]^2E[c_i]^2$ by Maclaurin’s Inequality, so
$$\exp(t\sum_i r_ic_i) = \sum_k t^k/k!\,E\left(\sum_i r_ic_i\right)^k \le \sum_k t^k/k!\,E\left(\sum_i b_i\right)^k = \exp(t\sum_i b_i)$$
where $b_i$ is a normal binomial variable with $p=(n/\sqrt{k}/n)^2=1/k$.
Finally, we then get
$$\Pr[X\ge t]\le \sqrt k \Pr[X_1 \ge t/\sqrt k] \le \sqrt k \exp\left(-2\left(\frac{t-p n \sqrt{k}}{n\sqrt k}\right)^2 n\right)$$
or more simply
$$\Pr[X\ge \lambda\sqrt{nk}+n/\sqrt k]\le \sqrt k \exp(-2\lambda^2)$$
or if we use the Hoeffding bound for small values of $p$ (large $k$):
$$\Pr[X\ge \lambda\sqrt{nk}+n/\sqrt k]\le \sqrt k \exp\left(\frac{-\lambda^2}{2/k+\lambda/\sqrt n}\right)$$
Note that if we had simply assumed all elements were independent, we would have gotten the bound $\Pr[X\ge \lambda\sqrt{n\sqrt{k}}+n/\sqrt k]\le \exp(-2\lambda^2)$. Equal to ours except for a factor $k^{1/4}$ in the exponent.
We will use the fact, that any bipartite graph of maximum degree $d$ can be broken into (at most) $d$ matchings.
In our case, this means that we can split $A$ into (at most) $\sqrt{k}$ disjoint sets of elements $S_i\subseteq\{(i,j)\in[n]\times[n] \mid A_{i,j}=1\}$ of size at most $n$ such that any every element in each set has a unique row and column.
(This uses that a $1$ in $A$ can be seen as an edge in the $n\times n$ bipartite graph that is the matrix.)
Now let $X$ be the total number of $1$s in your $n/\sqrt k\times n/\sqrt k$ matrix, $A^\ell$, and let $X_i$ be the number of elements coming from $S_i$.
Then $X\ge t\implies \exists_i\, X_i\ge t/\sqrt k$ and so
$$\Pr[X\ge t]\le \sqrt k \Pr[X_1 \ge t/\sqrt k]$$
by the union bound.
Now, because elements in $S_1$ don't share any rows/columns, it is easy to analyze using your original approach.
In particular, reorder the rows and columns using the matching, so that the first element in $S_1$ is $(1,1)$ the next is $(2,2)$ and so on.
Let $r_i$ be the random variable that is 1 if row $i$ is chosen and $c_i$ likewise. Then we want to upper bound $\Pr[\sum_i r_ic_i \ge t]$.
Notice $E[r_ic_ir_jc_j] = E[r_ir_j]E[c_ic_j] \le E[r_i]^2E[c_i]^2$ by Maclaurin’s Inequality, so
$$\exp(t\sum_i r_ic_i) = \sum_k t^k/k!\,E\left(\sum_i r_ic_i\right)^k \le \sum_k t^k/k!\,E\left(\sum_i b_i\right)^k = \exp(t\sum_i b_i)$$
where $b_i$ is a normal binomial variable with $p=(n/\sqrt{k}/n)^2=1/k$.
Finally, we then get
$$\Pr[X\ge t]\le \sqrt k \Pr[X_1 \ge t/\sqrt k] \le \sqrt k \exp\left(-2\left(\frac{t-p n \sqrt{k}}{n\sqrt k}\right)^2 n\right)$$
or more simply
$$\Pr[X\ge \lambda\sqrt{nk}+n/\sqrt k]\le \sqrt k \exp(-2\lambda^2)$$
or if we use the Hoeffding bound for small values of $p$ (large $k$):
$$\Pr[X\ge \lambda\sqrt{nk}+n/\sqrt k]\le \sqrt k \exp\left(\frac{-\lambda^2}{2/k+\lambda/\sqrt n}\right)$$
Note that if we had simply assumed all elements were independent, we would have gotten the bound $\Pr[X\ge \lambda\sqrt{n\sqrt{k}}+n/\sqrt k]\le \exp(-2\lambda^2)$. Equal to ours except for a factor $k^{1/4}$ in the exponent.
Context
StackExchange Computer Science Q#96381, answer score: 3
Revisions (0)
No revisions yet.