patternpythonMinor
Predicting the next pseudorandom value
Viewed 0 times
thenextvaluepredictingpseudorandom
Problem
The archived material for the Stanford University course on cryptography at coursera.org includes a problem where you have to predict the next output of a weak PRG. It can be briefly restated as follows:
$$\begin{align*}
P =&\ 295075153 \\
z_n =&\ x_n \oplus y_n,\qquad \textrm{where}\ 0 \le x_0, y_0 \lt P\ \textrm{(unknown random integers)} \\
x_{n+1} =& (2x_n + 5) \mod P \\
y_{n+1} =& (3y_n + 7) \mod P \\
(z_0, z_1, ..., z_8) =& (210205973, 22795300, 58776750, \\
&\ \ 121262470, 264731963, 140842553, \\
&\ \ 242590528, 195244728, 86752752) \\
z_9 =& ???
\end{align*}$$
I wrote a small program in C to solve this by brute force, and it finds a solution in just over 2 seconds:
seq = [210205973,22795300,58776750,121262470,264731963,
140842553,242590528,195244728,86752752]
P = 295075153L
for x0 in count(): # better than range(0,P), which is extremely slow
if x0==P:
break
y0 = seq[0] ^ x0
x, y
$$\begin{align*}
P =&\ 295075153 \\
z_n =&\ x_n \oplus y_n,\qquad \textrm{where}\ 0 \le x_0, y_0 \lt P\ \textrm{(unknown random integers)} \\
x_{n+1} =& (2x_n + 5) \mod P \\
y_{n+1} =& (3y_n + 7) \mod P \\
(z_0, z_1, ..., z_8) =& (210205973, 22795300, 58776750, \\
&\ \ 121262470, 264731963, 140842553, \\
&\ \ 242590528, 195244728, 86752752) \\
z_9 =& ???
\end{align*}$$
I wrote a small program in C to solve this by brute force, and it finds a solution in just over 2 seconds:
#include
#define P 295075153L
int main() {
long seq[] = { 210205973, 22795300, 58776750, 121262470, 264731963,
140842553, 242590528, 195244728, 86752752 };
long i, x, y, x0, y0;
for (x0=0; x0
I also tried to solve this problem in Python, but ran into two problems:
-
Although it does find the same solution, it takes about 3½ minutes to do so. This means it is about a hundred times slower than the C program. Surely this can't be right?
-
I'm also not happy with the way I'm checking for a match with the sequence values. In the C code, i is incremented at the end of each for() loop iteration, so when i==9 I can be sure that every value was matched successfully. But this incrementing happens somewhere else in Python, so instead I have to check for i==8 and will report a successful match even if the last comparison fails. How can I do this properly?
from itertools import countseq = [210205973,22795300,58776750,121262470,264731963,
140842553,242590528,195244728,86752752]
P = 295075153L
for x0 in count(): # better than range(0,P), which is extremely slow
if x0==P:
break
y0 = seq[0] ^ x0
x, y
Solution
There are a few code issues that tools like flake will pick up on, but I'll just focus on performance.
First up, switching to
Removing the
Changing the control flow gives us a smaller boost
Which gets us very close to the C solution
Forgot to compile with optimisations, which puts the Python version ~6x slower than C:
First up, switching to
pypy reduces the runtime on my machine from ~3 minutes to 16 seconds.⚡ time pypy orig.py
Solution found: x0=89059908, y0=164204369
Next value: 231886864
pypy orig.py 15.79s user 0.07s system 99% cpu 15.915 totalRemoving the
L gives us another drastic speedup:⚡ diff orig.py orig2.py
5c5
P = 295075153
⚡ time pypy orig2.py
Solution found: x0=89059908, y0=164204369
Next value: 231886864
pypy orig2.py 3.93s user 0.04s system 99% cpu 3.988 totalChanging the control flow gives us a smaller boost
def solve():
seq = [210205973, 22795300, 58776750,
121262470, 264731963, 140842553,
242590528, 195244728, 86752752]
P = 295075153
for x0 in xrange(0, P):
y0 = seq[0] ^ x0
x, y = x0, y0
for i in xrange(1, 9):
x = (2 * x + 5) % P
y = (3 * y + 7) % P
if (x ^ y) != seq[i]:
break
if i == 8:
print "Solution found: x0=%d, y0=%d" % (x0, y0)
x, y = x0, y0
for i in xrange(1, 10):
x = (2 * x + 5) % P
y = (3 * y + 7) % P
print "Next value: %d" % (x ^ y)
return
print "No solution found"
solve()⚡ time pypy orig3.py
Solution found: x0=89059908, y0=164204369
Next value: 231886864
pypy solve.py 2.60s user 0.03s system 99% cpu 2.633 totalWhich gets us very close to the C solution
⚡ time ./a.out
Solution found: x0=89059908, y0=164204369
Next value: 231886864
./a.out 2.36s user 0.01s system 99% cpu 2.369 totalForgot to compile with optimisations, which puts the Python version ~6x slower than C:
⚡ time ./a.out
Solution found: x0=89059908, y0=164204369
Next value: 231886864
./a.out 0.44s user 0.00s system 99% cpu 0.448 totalCode Snippets
⚡ time pypy orig.py
Solution found: x0=89059908, y0=164204369
Next value: 231886864
pypy orig.py 15.79s user 0.07s system 99% cpu 15.915 total⚡ diff orig.py orig2.py
5c5
< P = 295075153L
---
> P = 295075153
⚡ time pypy orig2.py
Solution found: x0=89059908, y0=164204369
Next value: 231886864
pypy orig2.py 3.93s user 0.04s system 99% cpu 3.988 totaldef solve():
seq = [210205973, 22795300, 58776750,
121262470, 264731963, 140842553,
242590528, 195244728, 86752752]
P = 295075153
for x0 in xrange(0, P):
y0 = seq[0] ^ x0
x, y = x0, y0
for i in xrange(1, 9):
x = (2 * x + 5) % P
y = (3 * y + 7) % P
if (x ^ y) != seq[i]:
break
if i == 8:
print "Solution found: x0=%d, y0=%d" % (x0, y0)
x, y = x0, y0
for i in xrange(1, 10):
x = (2 * x + 5) % P
y = (3 * y + 7) % P
print "Next value: %d" % (x ^ y)
return
print "No solution found"
solve()⚡ time pypy orig3.py
Solution found: x0=89059908, y0=164204369
Next value: 231886864
pypy solve.py 2.60s user 0.03s system 99% cpu 2.633 total⚡ time ./a.out
Solution found: x0=89059908, y0=164204369
Next value: 231886864
./a.out 2.36s user 0.01s system 99% cpu 2.369 totalContext
StackExchange Code Review Q#72137, answer score: 6
Revisions (0)
No revisions yet.