patternMinor
Reducing randomness needed by turing machine
Viewed 0 times
neededreducingrandomnessmachineturing
Problem
I am reading an article related to streaming algorithms named "Turnstile streaming algorithms might as well be linear sketched" by Yi Li, Huy Nguyen and David Woodruff,
At some point they have a random algorithm (uses a tape of random bits) that solved a problem over $\mathbb Z_{|m|}^{n} = \{-m,..,m\}^n$ that succeeds with probability $1-\delta$
They want to reduce the number of bits needed for randomness used by that algorithm using via the following statement:
Theorem 9: Let A be a randomized automaton solving problem P on
$\mathbb Z_{|m|}^{n}$ with failure probability at most $\delta$. There
is a ranomized automaton B that only needs to pick uniformly at random
one of $O(\delta ^{-2}nlogm)$ deterministic instances of A and solves
P on $\mathbb Z_{|m|}^{n}$ with failure probability at most $2\delta$
Proof: Let $A_1 ,A_2 ,.., A_{O(n\delta ^{-2}logm)}$ be independent
draws of deterministic automata picked by B.
Fix an input $x\in \mathbb Z_{|m|}^{n}$.
Let $p_A(x)$ be the fraction of automata among $A_1 ,A_2 ,..,
> A_{O(n\delta ^{-2}logm)}$ that solve problem P correctly on x and
$p_B(x)$ be the probability that B solves P on x correctly.
By a Chernoff bound, we have that $Pr\{|p_A(x)-p_B(x)|\geq \delta\}
\leq exp(-O(nlogm)) 0$.
Therefore, there exists a set of $A_1 ,A_2 ,.., A_{O(n\delta
> ^{-2}logm)}$ such that $|p_A(x)-p_B(x)|\leq \delta$ for all $x\in
> \mathbb Z_{|m|}^{n}$. The automaton B simply samples uniformly at
random from this set of deterministic automata
I am having trouble understanding some parts of the proof:
the random variable $p_A(x)$ should be the amount of $A_i$'s that succeeds divided by the amount of them that B has sampled?
if so than by the low of total probability its easy to see that:
$p_B(x) = \sum_{i}P[B\ choose\ A_i] P[B\ succeeds\ on\ x\ |\ B\ choose' A_i] = \sum_{i}\frac{1}{O(\delta^{-2}nlogm)} P[A_i\ succeeds\
on\ x] = p_A(x)$
where the last move comes from the fact
At some point they have a random algorithm (uses a tape of random bits) that solved a problem over $\mathbb Z_{|m|}^{n} = \{-m,..,m\}^n$ that succeeds with probability $1-\delta$
They want to reduce the number of bits needed for randomness used by that algorithm using via the following statement:
Theorem 9: Let A be a randomized automaton solving problem P on
$\mathbb Z_{|m|}^{n}$ with failure probability at most $\delta$. There
is a ranomized automaton B that only needs to pick uniformly at random
one of $O(\delta ^{-2}nlogm)$ deterministic instances of A and solves
P on $\mathbb Z_{|m|}^{n}$ with failure probability at most $2\delta$
Proof: Let $A_1 ,A_2 ,.., A_{O(n\delta ^{-2}logm)}$ be independent
draws of deterministic automata picked by B.
Fix an input $x\in \mathbb Z_{|m|}^{n}$.
Let $p_A(x)$ be the fraction of automata among $A_1 ,A_2 ,..,
> A_{O(n\delta ^{-2}logm)}$ that solve problem P correctly on x and
$p_B(x)$ be the probability that B solves P on x correctly.
By a Chernoff bound, we have that $Pr\{|p_A(x)-p_B(x)|\geq \delta\}
\leq exp(-O(nlogm)) 0$.
Therefore, there exists a set of $A_1 ,A_2 ,.., A_{O(n\delta
> ^{-2}logm)}$ such that $|p_A(x)-p_B(x)|\leq \delta$ for all $x\in
> \mathbb Z_{|m|}^{n}$. The automaton B simply samples uniformly at
random from this set of deterministic automata
I am having trouble understanding some parts of the proof:
the random variable $p_A(x)$ should be the amount of $A_i$'s that succeeds divided by the amount of them that B has sampled?
if so than by the low of total probability its easy to see that:
$p_B(x) = \sum_{i}P[B\ choose\ A_i] P[B\ succeeds\ on\ x\ |\ B\ choose' A_i] = \sum_{i}\frac{1}{O(\delta^{-2}nlogm)} P[A_i\ succeeds\
on\ x] = p_A(x)$
where the last move comes from the fact
Solution
Here is another way of looking at $p_A$ and $p_B$. We can think of $B$ as a deterministic algorithm which accepts an additional input $r$ representing the randomness. This input is drawn from some distribution $R$. Also, denote by $P(x)$ the correct answer. Then
$$
p_B(x) = \Pr_{r \sim R} [B(x,r) = P(x)].
$$
Let $M = O(n\delta^{-2}\log m)$, and sample $r_1,\ldots,r_M \sim R$. The algorithm $A_i$ simply runs $B$ with randomness $r_i$, that is $A_i(x) = B(x,r_i)$. We can define another algorithm $A$ which first choose $i \in \{1,\ldots,M\}$ uniformly at random, and then runs $A_i$. The success probability of $A$ is
$$
\begin{align*}
p_A(x) &= \Pr_{i \in \{1,\ldots,M\}} [B(x,r_i) = P(x)] \\ &= \Pr_{r \in \{r_1,\ldots,r_M\}} [B(x,r) = P(x)].
\end{align*}
$$
Thus in $B$, the randomness is chosen according to $R$, whereas in $A$, the randomness is chosen uniformly from a set $\{r_1,\ldots,r_M\}$. If we choose $r_1,\ldots,r_M \sim R$, then we expect the performance of $A$ to be very similar to that of $B$, and this can be quantified using Chernoff's inequality.
The probability $p_A(x)$ itself depends on the choice of $r_1,\ldots,r_M$. What may confuse you is how we conceive of this two-step random process. We first choose $r_1,\ldots,r_M$, and this gives us the function $p_A(x)$, which is a random function depending on the choice of $r_1,\ldots,r_M$. Note that we don't take probability over the choice of $r_1,\ldots,r_M$ – if we did, we would just get $p_B(x)$.
$$
p_B(x) = \Pr_{r \sim R} [B(x,r) = P(x)].
$$
Let $M = O(n\delta^{-2}\log m)$, and sample $r_1,\ldots,r_M \sim R$. The algorithm $A_i$ simply runs $B$ with randomness $r_i$, that is $A_i(x) = B(x,r_i)$. We can define another algorithm $A$ which first choose $i \in \{1,\ldots,M\}$ uniformly at random, and then runs $A_i$. The success probability of $A$ is
$$
\begin{align*}
p_A(x) &= \Pr_{i \in \{1,\ldots,M\}} [B(x,r_i) = P(x)] \\ &= \Pr_{r \in \{r_1,\ldots,r_M\}} [B(x,r) = P(x)].
\end{align*}
$$
Thus in $B$, the randomness is chosen according to $R$, whereas in $A$, the randomness is chosen uniformly from a set $\{r_1,\ldots,r_M\}$. If we choose $r_1,\ldots,r_M \sim R$, then we expect the performance of $A$ to be very similar to that of $B$, and this can be quantified using Chernoff's inequality.
The probability $p_A(x)$ itself depends on the choice of $r_1,\ldots,r_M$. What may confuse you is how we conceive of this two-step random process. We first choose $r_1,\ldots,r_M$, and this gives us the function $p_A(x)$, which is a random function depending on the choice of $r_1,\ldots,r_M$. Note that we don't take probability over the choice of $r_1,\ldots,r_M$ – if we did, we would just get $p_B(x)$.
Context
StackExchange Computer Science Q#75674, answer score: 3
Revisions (0)
No revisions yet.