HiveBrain v1.2.0
Get Started
← Back to all entries
debugMinor

Fastest search algorithm in a sorted list with certain error rate-limiting constraints

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
errorconstraintssearchwithratelimitingalgorithmfastestsortedlist

Problem

This problem came up during the Google CTF 2017. For background information about the challenge you can search for GoogleCTF A7 ~ Gee cue elle.

Problem description:

  • A random number N between 0 and 39402006196394479212279040100143613805079739270465446667948293404245721771497210611414266254884915640806627990306815 (~3.9*10^115 or 64^64) is selected



  • You have to find the correct number N



  • You can take a guess X, and get a response whether the number X is greater or lower than N



  • The twist:



  • you can only perform 13 guesses in 30s without hitting a 90s timeout - 13/30



  • if your guess was lower, then you get an error and you can only have 2 errors in 30s without a 90s timeout - 2/30



  • you have only 2240s time to find the correct value



I had a short discussion with the challenge creator about the solution and he solved it like the broken egg problem. His algorithm basically runs in constant time. As max number is 64^64, he divides the problem in 64 buildings with 64 floors. And a broken egg is an error. This means it takes overall pretty constant 1920s. I think the problem is not equivalent to the egg problem, but somehow the algorithm performs here really really well.

I chose a skewed binary search. Instead of splitting the search field into 50:50, I skew to one side 2/13=0.15 15:85. This way the probability of hitting the punishing error condition is pretty unlikely. My intuition tells me, that this should be the most efficient algorithm, but apparently it's not.

  • I thought the 0.15 skew would be the best ratio, but after analysing the time it takes with different ratios, the most efficient value seems to be around ~0.22. My calculation is obviously wrong, but what would have been the correct calculation to find ~0.22?



  • Why does the egg problem algorithm perform better (as in faster on average)?



  • What is the on average fastest algorithm here? And why is it not based on a skewed binary search?



Any thoughts and com

Solution

For simplicity of analysis I will say that 'too high' is the error condition, which is just the problem inverted, $64^{64}-n$. I also assume that requests are instant.

It takes some thinking, but waiting is never an optimal strategy. So we view the problem as just a series of guesses, done in blocks of 13 (unless we hit the 90 second timeout). Instead of timeouts our penalty will be losing guesses, and instead of minimizing time we will minimize guesses used. I write $V_2$ for the value of a guess (in bits) when we are in the state where we guessed too high twice in this block. Similarly I will write $V_1, V_0$. I write $g \in \{0\dots13\}$ for the number of guesses left in the block, including the current one.

I will evaluate a strategy where we pick three numbers $p_0$, $p_1$, $p_2$, where each $p$ encodes how skewed the binary search is for when we've guessed $0$, $1$ or $2$ wrong times in this block.

The expected value (in bits) of a guess $p \in [0, 1]$ (where $0$ is the lowest guess and $1$ is the highest) is the binary entropy function $H(p) = -p \log_2 p - (1 - p)\log_2(1-p)$.

However, for $V_2(g)$, if we guess wrong we lose the rest of this block and the next two blocks, $B$ is the value of a block in bits.

$$V_2(g) = H(p_2) - 2p_2B - p_2\sum_{k=1}^{g-1}V_2(k)$$

If we write $E_2(g)$ as the value in bits of being in state $2$ with $g$ guesses left:

$$E_2(g) = \sum_{k=1}^{g} V_2(k) = \sum_{k=1}^g \big( H(p_2) - 2p_2B - p_2E_2(k-1)\big ) $$
$$E_2(g) = g(H(p_2) - 2p_2B) - p_2 \sum_{k=1}^g E_2(k-1)$$

This recurrence can be solved:

$$E_2(g) = (H(p_2)/p_2 - 2B)(1-(1 - p_2)^g)$$

Similarly, if we guess wrong for $V_1$, we lose our advantage from state $1$:

$$V_1(g) = H(p_1) - p_1(E_1(g-1) - E_2(g-1))$$
$$E_1(g) = \sum_{k=1}^{g} V_1(k) = \sum_{k=1}^{g} \big( H(p_1) - p_1(E_1(k-1) - E_2(k-1))\big )$$

$$E_1(g) = g(H(p_1) - p_1E_2(g-1)) - p_1\sum_{k=1}^{g}E_1(k-1)$$

$$E_1(g) = (H(p_1)/p_1 - E_2(g-1))(1-(1 - p_1)^g)$$

And $E_0$ is directly analogous. Now, since $B = E_0(13)$ we have:

$$E_0(g) = (H(p_0)/p_0 - E_1(g-1))(1-(1 - p_0)^g)$$
$$E_1(g) = (H(p_1)/p_1 - E_2(g-1))(1-(1 - p_1)^g)$$
$$E_2(g) = (H(p_2)/p_2 - 2E_0(13))(1-(1 - p_2)^g)$$

Oh boy... With a CAS I found:

$$E_0(13) =\frac{1}{2P_0P_1P_2 + 1} \left(\frac{P_0H(p_0)}{p_0} - \frac{P_0P_1H(p_1)}{p1} +\frac{P_0P_1P_2H(p_2)}{p_2}\right)$$

where $P_0 = 1-(1-p_0)^{13}$, $P_1 = 1-(1-p_1)^{12}$ and $P_2 = 1-(1-p_2)^{11}$. This is the expression we want to maximize.

Numerically maximizing this expression gives $E_0(13) \approx 6.72$ bits with:

$$p_0 \approx 0.1854365,\quad p_1 \approx 0.1422816,\quad p_2 \approx 0.0000899$$

This means that in order to get $\log_2(64^{64}) = 384$ bits we expect to need 57.1 blocks, or 1714.3 seconds.

This isn't even optimal yet, though. $p_0, p_1, p_2$ should actually be functions of $g$ (consider that if you are on your last guess in state $0$, then you can be really greedy and not get punished). I will see if I can improve my answer.

Context

StackExchange Computer Science Q#77459, answer score: 2

Revisions (0)

No revisions yet.