patternMinor
Las Vegas algorithm for finding 00000 in bit string
Viewed 0 times
bit00000vegasalgorithmforfindinglasstring
Problem
Problem 1: Consider the following problem: given a binary string $w=a_1a_2\cdots a_n \in\{0,1\}^*$, decide whether $w$ contains 00000 as a substring (i.e., where $w$ contains five consecutive 0's). There is an obvious $O(n)$-time algorithm. You will investigate the hidden constant factor, concerning the number of bits that need to be examined (i.e., the number of $a_i$'s that need to be evaluated) to solve this problem.
Above is a question some friends and I have looked at to prepare for a qualifying exam and is itself from a past qualifying exam. Parts 1 and 2 are fine but what we are having troubles with is part 3. I think the idea behind this problem is to imagine that evaluating some bit $s_i$ is really expensive, so we would like to get the right answer while minimizing how many of these bits we evaluate. That said, there is some ambiguity to us about whether we (1) care about the number of distinct bits we evaluate or (2) the total number of times we evaluate any bit. Clearly if we were worried about the latter, we could just store the result of the bit evaluation and avoid doing an expensive evaluation again, but we are not sure. For now, I assume the latter case and specifically that if we want the value for the same bit twice, we assume we need to evaluate it each time and incur $2$ units to th
- (10/100 points) First argue that every deterministic algorithm solving this problem needs to evaluate $\Omega(n)$ bits in the worst case.
- (30/100 points) Prove that for a sufficiently large constant $c$, for any string $a_1 \cdots a_c$ of length $c$ that does not contain 00000, there must exist two indices $i,j \in \{1,\ldots,c\}$ with $a_i=a_j=1$ and $2 \leq |i-j| \leq 5$.
- (60/100 points) Design a Las Vegas randomized algorithm solving this problem (with zero error probability) that evaluates an expected $\alpha n + o(n)$ number of bits for any input, for some constant $\alpha$ strictly smaller than $1$. (You do not need to optimize the constant $\alpha$.)
Above is a question some friends and I have looked at to prepare for a qualifying exam and is itself from a past qualifying exam. Parts 1 and 2 are fine but what we are having troubles with is part 3. I think the idea behind this problem is to imagine that evaluating some bit $s_i$ is really expensive, so we would like to get the right answer while minimizing how many of these bits we evaluate. That said, there is some ambiguity to us about whether we (1) care about the number of distinct bits we evaluate or (2) the total number of times we evaluate any bit. Clearly if we were worried about the latter, we could just store the result of the bit evaluation and avoid doing an expensive evaluation again, but we are not sure. For now, I assume the latter case and specifically that if we want the value for the same bit twice, we assume we need to evaluate it each time and incur $2$ units to th
Solution
Let $c$ be the constant from part 2, and assume for simplicity that $n$ is divisible by $c$ (otherwise, query the first $n \bmod c$ bits).
Partition the string into intervals of length $c$, and go over them one by one in an arbitrary order. The order doesn't need to be random – you can go over them from left to right.
For each interval $I$, choose a pair $(i,j)$ out of all pairs $i,j \in \{1,\ldots,c\}$ at distance $j-i \in \{2,\ldots,5\}$, and query the points $I_1,\ldots,I_i,I_j,\ldots,I_c$. If $I_i=I_j=1$, then there is no need to query the points $I_{i+1},\ldots,I_{j-1}$, otherwise do query them. If the interval contains 00000, immediately halt, answering Yes.
After going through all intervals, check whether the string contains 00000 (the unqueried points make no difference here), and answer accordingly.
According to part 2, if an interval $I$ doesn't contain 00000, then with probability $p \geq 1/4c$ the chosen pair $(i,j)$ will satisfy $I_i = I_j = 1$, and in this case we query at most $c-1$ out of the $c$ points. If it does contain 00000, then we query all points, and immediately halt. This shows that on average, we query at most $(1-1/4c^2)n + 2c$ points (taking into account the case in which $n$ is not divisible by $c$).
Partition the string into intervals of length $c$, and go over them one by one in an arbitrary order. The order doesn't need to be random – you can go over them from left to right.
For each interval $I$, choose a pair $(i,j)$ out of all pairs $i,j \in \{1,\ldots,c\}$ at distance $j-i \in \{2,\ldots,5\}$, and query the points $I_1,\ldots,I_i,I_j,\ldots,I_c$. If $I_i=I_j=1$, then there is no need to query the points $I_{i+1},\ldots,I_{j-1}$, otherwise do query them. If the interval contains 00000, immediately halt, answering Yes.
After going through all intervals, check whether the string contains 00000 (the unqueried points make no difference here), and answer accordingly.
According to part 2, if an interval $I$ doesn't contain 00000, then with probability $p \geq 1/4c$ the chosen pair $(i,j)$ will satisfy $I_i = I_j = 1$, and in this case we query at most $c-1$ out of the $c$ points. If it does contain 00000, then we query all points, and immediately halt. This shows that on average, we query at most $(1-1/4c^2)n + 2c$ points (taking into account the case in which $n$ is not divisible by $c$).
Context
StackExchange Computer Science Q#135811, answer score: 2
Revisions (0)
No revisions yet.