HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Predicting the next pseudorandom value

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thenextvaluepredictingpseudorandom

Problem

The archived material for the Stanford University course on cryptography at coursera.org includes a problem where you have to predict the next output of a weak PRG. It can be briefly restated as follows:


$$\begin{align*}
P =&\ 295075153 \\
z_n =&\ x_n \oplus y_n,\qquad \textrm{where}\ 0 \le x_0, y_0 \lt P\ \textrm{(unknown random integers)} \\
x_{n+1} =& (2x_n + 5) \mod P \\
y_{n+1} =& (3y_n + 7) \mod P \\
(z_0, z_1, ..., z_8) =& (210205973, 22795300, 58776750, \\
&\ \ 121262470, 264731963, 140842553, \\
&\ \ 242590528, 195244728, 86752752) \\
z_9 =& ???
\end{align*}$$

I wrote a small program in C to solve this by brute force, and it finds a solution in just over 2 seconds:



#include
#define P 295075153L

int main() {
long seq[] = { 210205973, 22795300, 58776750, 121262470, 264731963,
140842553, 242590528, 195244728, 86752752 };
long i, x, y, x0, y0;

for (x0=0; x0



I also tried to solve this problem in Python, but ran into two problems:

-
Although it does find the same solution, it takes about 3½ minutes to do so. This means it is about a hundred times slower than the C program. Surely this can't be right?

-
I'm also not happy with the way I'm checking for a match with the sequence values. In the C code,
i is incremented at the end of each for() loop iteration, so when i==9 I can be sure that every value was matched successfully. But this incrementing happens somewhere else in Python, so instead I have to check for i==8 and will report a successful match even if the last comparison fails. How can I do this properly?

from itertools import count

seq = [210205973,22795300,58776750,121262470,264731963,
140842553,242590528,195244728,86752752]
P = 295075153L
for x0 in count(): # better than range(0,P), which is extremely slow
if x0==P:
break
y0 = seq[0] ^ x0
x, y

Solution

There are a few code issues that tools like flake will pick up on, but I'll just focus on performance.

First up, switching to pypy reduces the runtime on my machine from ~3 minutes to 16 seconds.

⚡ time pypy orig.py
Solution found: x0=89059908, y0=164204369
Next value: 231886864
pypy orig.py  15.79s user 0.07s system 99% cpu 15.915 total


Removing the L gives us another drastic speedup:

⚡ diff orig.py orig2.py
5c5
 P = 295075153

⚡ time pypy orig2.py
Solution found: x0=89059908, y0=164204369
Next value: 231886864
pypy orig2.py  3.93s user 0.04s system 99% cpu 3.988 total


Changing the control flow gives us a smaller boost

def solve():
    seq = [210205973, 22795300, 58776750,
           121262470, 264731963, 140842553,
           242590528, 195244728, 86752752]
    P = 295075153
    for x0 in xrange(0, P):
        y0 = seq[0] ^ x0
        x, y = x0, y0
        for i in xrange(1, 9):
            x = (2 * x + 5) % P
            y = (3 * y + 7) % P
            if (x ^ y) != seq[i]:
                break
            if i == 8:
                print "Solution found: x0=%d, y0=%d" % (x0, y0)
                x, y = x0, y0
                for i in xrange(1, 10):
                    x = (2 * x + 5) % P
                    y = (3 * y + 7) % P
                print "Next value: %d" % (x ^ y)
                return
    print "No solution found"

solve()


⚡ time pypy orig3.py
Solution found: x0=89059908, y0=164204369
Next value: 231886864
pypy solve.py  2.60s user 0.03s system 99% cpu 2.633 total


Which gets us very close to the C solution

⚡ time ./a.out
Solution found: x0=89059908, y0=164204369
Next value: 231886864
./a.out  2.36s user 0.01s system 99% cpu 2.369 total


Forgot to compile with optimisations, which puts the Python version ~6x slower than C:

⚡ time ./a.out
Solution found: x0=89059908, y0=164204369
Next value: 231886864
./a.out  0.44s user 0.00s system 99% cpu 0.448 total

Code Snippets

⚡ time pypy orig.py
Solution found: x0=89059908, y0=164204369
Next value: 231886864
pypy orig.py  15.79s user 0.07s system 99% cpu 15.915 total
⚡ diff orig.py orig2.py
5c5
< P = 295075153L
---
> P = 295075153

⚡ time pypy orig2.py
Solution found: x0=89059908, y0=164204369
Next value: 231886864
pypy orig2.py  3.93s user 0.04s system 99% cpu 3.988 total
def solve():
    seq = [210205973, 22795300, 58776750,
           121262470, 264731963, 140842553,
           242590528, 195244728, 86752752]
    P = 295075153
    for x0 in xrange(0, P):
        y0 = seq[0] ^ x0
        x, y = x0, y0
        for i in xrange(1, 9):
            x = (2 * x + 5) % P
            y = (3 * y + 7) % P
            if (x ^ y) != seq[i]:
                break
            if i == 8:
                print "Solution found: x0=%d, y0=%d" % (x0, y0)
                x, y = x0, y0
                for i in xrange(1, 10):
                    x = (2 * x + 5) % P
                    y = (3 * y + 7) % P
                print "Next value: %d" % (x ^ y)
                return
    print "No solution found"

solve()
⚡ time pypy orig3.py
Solution found: x0=89059908, y0=164204369
Next value: 231886864
pypy solve.py  2.60s user 0.03s system 99% cpu 2.633 total
⚡ time ./a.out
Solution found: x0=89059908, y0=164204369
Next value: 231886864
./a.out  2.36s user 0.01s system 99% cpu 2.369 total

Context

StackExchange Code Review Q#72137, answer score: 6

Revisions (0)

No revisions yet.