debugMinor
Fastest search algorithm in a sorted list with certain error rate-limiting constraints
Viewed 0 times
errorconstraintssearchwithratelimitingalgorithmfastestsortedlist
Problem
This problem came up during the Google CTF 2017. For background information about the challenge you can search for
Problem description:
I had a short discussion with the challenge creator about the solution and he solved it like the broken egg problem. His algorithm basically runs in constant time. As max number is
I chose a skewed binary search. Instead of splitting the search field into 50:50, I skew to one side
Any thoughts and com
GoogleCTF A7 ~ Gee cue elle.Problem description:
- A random number
Nbetween0and39402006196394479212279040100143613805079739270465446667948293404245721771497210611414266254884915640806627990306815(~3.9*10^115or64^64) is selected
- You have to find the correct number
N
- You can take a guess
X, and get a response whether the numberXis greater or lower thanN
- The twist:
- you can only perform 13 guesses in 30s without hitting a 90s timeout -
13/30
- if your guess was lower, then you get an error and you can only have 2 errors in 30s without a 90s timeout -
2/30
- you have only
2240stime to find the correct value
I had a short discussion with the challenge creator about the solution and he solved it like the broken egg problem. His algorithm basically runs in constant time. As max number is
64^64, he divides the problem in 64 buildings with 64 floors. And a broken egg is an error. This means it takes overall pretty constant 1920s. I think the problem is not equivalent to the egg problem, but somehow the algorithm performs here really really well.I chose a skewed binary search. Instead of splitting the search field into 50:50, I skew to one side
2/13=0.15 15:85. This way the probability of hitting the punishing error condition is pretty unlikely. My intuition tells me, that this should be the most efficient algorithm, but apparently it's not.- I thought the
0.15skew would be the best ratio, but after analysing the time it takes with different ratios, the most efficient value seems to be around~0.22. My calculation is obviously wrong, but what would have been the correct calculation to find~0.22?
- Why does the egg problem algorithm perform better (as in faster on average)?
- What is the on average fastest algorithm here? And why is it not based on a skewed binary search?
Any thoughts and com
Solution
For simplicity of analysis I will say that 'too high' is the error condition, which is just the problem inverted, $64^{64}-n$. I also assume that requests are instant.
It takes some thinking, but waiting is never an optimal strategy. So we view the problem as just a series of guesses, done in blocks of 13 (unless we hit the 90 second timeout). Instead of timeouts our penalty will be losing guesses, and instead of minimizing time we will minimize guesses used. I write $V_2$ for the value of a guess (in bits) when we are in the state where we guessed too high twice in this block. Similarly I will write $V_1, V_0$. I write $g \in \{0\dots13\}$ for the number of guesses left in the block, including the current one.
I will evaluate a strategy where we pick three numbers $p_0$, $p_1$, $p_2$, where each $p$ encodes how skewed the binary search is for when we've guessed $0$, $1$ or $2$ wrong times in this block.
The expected value (in bits) of a guess $p \in [0, 1]$ (where $0$ is the lowest guess and $1$ is the highest) is the binary entropy function $H(p) = -p \log_2 p - (1 - p)\log_2(1-p)$.
However, for $V_2(g)$, if we guess wrong we lose the rest of this block and the next two blocks, $B$ is the value of a block in bits.
$$V_2(g) = H(p_2) - 2p_2B - p_2\sum_{k=1}^{g-1}V_2(k)$$
If we write $E_2(g)$ as the value in bits of being in state $2$ with $g$ guesses left:
$$E_2(g) = \sum_{k=1}^{g} V_2(k) = \sum_{k=1}^g \big( H(p_2) - 2p_2B - p_2E_2(k-1)\big ) $$
$$E_2(g) = g(H(p_2) - 2p_2B) - p_2 \sum_{k=1}^g E_2(k-1)$$
This recurrence can be solved:
$$E_2(g) = (H(p_2)/p_2 - 2B)(1-(1 - p_2)^g)$$
Similarly, if we guess wrong for $V_1$, we lose our advantage from state $1$:
$$V_1(g) = H(p_1) - p_1(E_1(g-1) - E_2(g-1))$$
$$E_1(g) = \sum_{k=1}^{g} V_1(k) = \sum_{k=1}^{g} \big( H(p_1) - p_1(E_1(k-1) - E_2(k-1))\big )$$
$$E_1(g) = g(H(p_1) - p_1E_2(g-1)) - p_1\sum_{k=1}^{g}E_1(k-1)$$
$$E_1(g) = (H(p_1)/p_1 - E_2(g-1))(1-(1 - p_1)^g)$$
And $E_0$ is directly analogous. Now, since $B = E_0(13)$ we have:
$$E_0(g) = (H(p_0)/p_0 - E_1(g-1))(1-(1 - p_0)^g)$$
$$E_1(g) = (H(p_1)/p_1 - E_2(g-1))(1-(1 - p_1)^g)$$
$$E_2(g) = (H(p_2)/p_2 - 2E_0(13))(1-(1 - p_2)^g)$$
Oh boy... With a CAS I found:
$$E_0(13) =\frac{1}{2P_0P_1P_2 + 1} \left(\frac{P_0H(p_0)}{p_0} - \frac{P_0P_1H(p_1)}{p1} +\frac{P_0P_1P_2H(p_2)}{p_2}\right)$$
where $P_0 = 1-(1-p_0)^{13}$, $P_1 = 1-(1-p_1)^{12}$ and $P_2 = 1-(1-p_2)^{11}$. This is the expression we want to maximize.
Numerically maximizing this expression gives $E_0(13) \approx 6.72$ bits with:
$$p_0 \approx 0.1854365,\quad p_1 \approx 0.1422816,\quad p_2 \approx 0.0000899$$
This means that in order to get $\log_2(64^{64}) = 384$ bits we expect to need 57.1 blocks, or 1714.3 seconds.
This isn't even optimal yet, though. $p_0, p_1, p_2$ should actually be functions of $g$ (consider that if you are on your last guess in state $0$, then you can be really greedy and not get punished). I will see if I can improve my answer.
It takes some thinking, but waiting is never an optimal strategy. So we view the problem as just a series of guesses, done in blocks of 13 (unless we hit the 90 second timeout). Instead of timeouts our penalty will be losing guesses, and instead of minimizing time we will minimize guesses used. I write $V_2$ for the value of a guess (in bits) when we are in the state where we guessed too high twice in this block. Similarly I will write $V_1, V_0$. I write $g \in \{0\dots13\}$ for the number of guesses left in the block, including the current one.
I will evaluate a strategy where we pick three numbers $p_0$, $p_1$, $p_2$, where each $p$ encodes how skewed the binary search is for when we've guessed $0$, $1$ or $2$ wrong times in this block.
The expected value (in bits) of a guess $p \in [0, 1]$ (where $0$ is the lowest guess and $1$ is the highest) is the binary entropy function $H(p) = -p \log_2 p - (1 - p)\log_2(1-p)$.
However, for $V_2(g)$, if we guess wrong we lose the rest of this block and the next two blocks, $B$ is the value of a block in bits.
$$V_2(g) = H(p_2) - 2p_2B - p_2\sum_{k=1}^{g-1}V_2(k)$$
If we write $E_2(g)$ as the value in bits of being in state $2$ with $g$ guesses left:
$$E_2(g) = \sum_{k=1}^{g} V_2(k) = \sum_{k=1}^g \big( H(p_2) - 2p_2B - p_2E_2(k-1)\big ) $$
$$E_2(g) = g(H(p_2) - 2p_2B) - p_2 \sum_{k=1}^g E_2(k-1)$$
This recurrence can be solved:
$$E_2(g) = (H(p_2)/p_2 - 2B)(1-(1 - p_2)^g)$$
Similarly, if we guess wrong for $V_1$, we lose our advantage from state $1$:
$$V_1(g) = H(p_1) - p_1(E_1(g-1) - E_2(g-1))$$
$$E_1(g) = \sum_{k=1}^{g} V_1(k) = \sum_{k=1}^{g} \big( H(p_1) - p_1(E_1(k-1) - E_2(k-1))\big )$$
$$E_1(g) = g(H(p_1) - p_1E_2(g-1)) - p_1\sum_{k=1}^{g}E_1(k-1)$$
$$E_1(g) = (H(p_1)/p_1 - E_2(g-1))(1-(1 - p_1)^g)$$
And $E_0$ is directly analogous. Now, since $B = E_0(13)$ we have:
$$E_0(g) = (H(p_0)/p_0 - E_1(g-1))(1-(1 - p_0)^g)$$
$$E_1(g) = (H(p_1)/p_1 - E_2(g-1))(1-(1 - p_1)^g)$$
$$E_2(g) = (H(p_2)/p_2 - 2E_0(13))(1-(1 - p_2)^g)$$
Oh boy... With a CAS I found:
$$E_0(13) =\frac{1}{2P_0P_1P_2 + 1} \left(\frac{P_0H(p_0)}{p_0} - \frac{P_0P_1H(p_1)}{p1} +\frac{P_0P_1P_2H(p_2)}{p_2}\right)$$
where $P_0 = 1-(1-p_0)^{13}$, $P_1 = 1-(1-p_1)^{12}$ and $P_2 = 1-(1-p_2)^{11}$. This is the expression we want to maximize.
Numerically maximizing this expression gives $E_0(13) \approx 6.72$ bits with:
$$p_0 \approx 0.1854365,\quad p_1 \approx 0.1422816,\quad p_2 \approx 0.0000899$$
This means that in order to get $\log_2(64^{64}) = 384$ bits we expect to need 57.1 blocks, or 1714.3 seconds.
This isn't even optimal yet, though. $p_0, p_1, p_2$ should actually be functions of $g$ (consider that if you are on your last guess in state $0$, then you can be really greedy and not get punished). I will see if I can improve my answer.
Context
StackExchange Computer Science Q#77459, answer score: 2
Revisions (0)
No revisions yet.