patternpythonMinor
Finding a string whose MD5 starts like the digits of π
Viewed 0 times
startsthewhosedigitsmd5likefindingstring
Problem
I tried to make a program to find a string whose MD5-hash starts like the digits of pi, where dot is omitted. Is there a faster way to do it than this:
import hashlib
import random
import string
n = 1
while True:
word = ''.join(random.choice(string.ascii_lowercase) for _ in range(20))
if (str(hashlib.md5(word.encode('utf-8')).hexdigest()))[:n] == '31415926535897932384626433832795'[0:n]:
print('n='+str(n)+", "+word)
n = n + 1Solution
The algorithm you're using is a brute-force algorithm. It will require around (24)n hash checks for a match of n digits. Each additional digit will require an average of 16-1 more time to calculate than the previous one.
The first, and hopefully obvious, step would be to get a better algorithm for a huge speedup - unfortunately, this is non-trivial in this case.
An additional step could be to use a language that is, typically, faster than Python. You probably want a compiled language like C/C++, or even Java.
Lastly, this is a perfect parallel computing problem. You have lots of small pieces (i.e. pick some text and hash it) and each is distinct, so they're easy to do in parallel.
Here I've given some improvements for a Python program, brute-force algorithm, singly-threaded.
The most important aspect of the code is how quickly we can create the word and calculate the md5 hash. I've rewritten your code to not use an infinite loop, but instead loop for a specific number of tries. This lets us time things using
Some other comments:
The code above calls
At this point the profiler is telling me that the built-in method
Below I've shown the
Results:
The first, and hopefully obvious, step would be to get a better algorithm for a huge speedup - unfortunately, this is non-trivial in this case.
An additional step could be to use a language that is, typically, faster than Python. You probably want a compiled language like C/C++, or even Java.
Lastly, this is a perfect parallel computing problem. You have lots of small pieces (i.e. pick some text and hash it) and each is distinct, so they're easy to do in parallel.
Here I've given some improvements for a Python program, brute-force algorithm, singly-threaded.
The most important aspect of the code is how quickly we can create the word and calculate the md5 hash. I've rewritten your code to not use an infinite loop, but instead loop for a specific number of tries. This lets us time things using
timeit.from hashlib import md5
from random import choice, shuffle
from string import ascii_lowercase
from timeit import timeit
from itertools import product, islice
from os.path import commonprefix
def find_hash_pi_external(tries):
n = 1
for _ in range(tries):
word = ''.join(choice(ascii_lowercase) for _ in range(20))
if (str(md5(word.encode('utf-8')).hexdigest()))[:n] == '31415926535897932384626433832795'[0:n]:
print('n=' + str(n) + ", " + word)
n = n + 1Some other comments:
- The method
hexdigest()returns a string, so you don't need to callstr()there.
- You can exclude the
0ina[0:n].
- You don't always have to increment
nby one - sometimes you've found one that will match more thanncharacters.
def find_hash_pi_1(tries):
pi = '31415926535897932384626433832795'
n = 1
for _ in range(tries):
word = ''.join(choice(ascii_lowercase) for _ in range(20))
hash = md5(word.encode('utf-8')).hexdigest()
if hash[:n] == pi[:n]:
n = len(commonprefix([pi, hash])) # note: commonprefix is slow so we only use it if there is a success
print('n=' + str(n) + ", word=" + word + ", hash=" + hash)
n = n + 1The code above calls
choice many times to make each word random. We don't need to do that - instead, we need only different word value in the each step of the loop. It also creates the word as a string, but the md5 module requires bytes. It is expensive to convert, so we can try to create the word as bytes without the string intermediate.def find_hash_pi_2(tries):
pi = '31415926535897932384626433832795'
n = 1
letters = bytes(ascii_lowercase, 'utf-8')
for word in islice(product(letters, repeat=20), tries):
word = bytes(word)
hash = md5(word).hexdigest()
if hash[:n] == pi[:n]:
n = len(commonprefix([pi, hash])) # note: commonprefix is slow so we only use it if we have a hit
print('n=' + str(n) + ", word=" + word.decode('utf-8') + ", hash=" + hash)
n += 1At this point the profiler is telling me that the built-in method
_hashlib.openssl_md5 takes up about 10% of our time. There is still space to improve, but our brute-force algorithm that uses separate md5() calls will be difficult to improve by much more.Below I've shown the
timeit output for all three solutionsif __name__ == "__main__":
print(timeit('find_hash_pi_external(2 ** 16)', setup="from __main__ import find_hash_pi_external", number=1))
print(timeit('find_hash_pi_1(2 ** 16)', setup="from __main__ import find_hash_pi_1", number=1))
print(timeit('find_hash_pi_2(2 ** 16)', setup="from __main__ import find_hash_pi_2", number=1))Results:
n=1, iaathglhsnzablgilhim
n=2, vjvcpilnwymbckhubdxl
n=3, gteyvjexqwahhbgasvco
n=4, rhjccvcubaeqfurjbfgi
1.227025089520841
n=1, word=wanforjhvqrpgnbbvlap, hash=3feca550be69a201c6147a0a20dbf338
n=2, word=eulvqycsfrgqhotpqsah, hash=3177a30c82a3b12856def16568d8b155
n=3, word=knmwruvaahghowwsyqyx, hash=3146373e28453b9dc7d1bb9961917dec
n=4, word=qrsgbfrcwrzwlqjgmryp, hash=31414f71ee6ba9516c9d0acb16e67662
1.239360037012343
n=1, word=aaaaaaaaaaaaaaaaaabp, hash=3f6056bb268f4176685134ae56a838da
n=2, word=aaaaaaaaaaaaaaaaaaqt, hash=311468ef9220551699d61e9b70e43aa7
n=4, word=aaaaaaaaaaaaaaaaafml, hash=314125241241d57b8085afe08b50bdf0
0.10168091144473257Code Snippets
from hashlib import md5
from random import choice, shuffle
from string import ascii_lowercase
from timeit import timeit
from itertools import product, islice
from os.path import commonprefix
def find_hash_pi_external(tries):
n = 1
for _ in range(tries):
word = ''.join(choice(ascii_lowercase) for _ in range(20))
if (str(md5(word.encode('utf-8')).hexdigest()))[:n] == '31415926535897932384626433832795'[0:n]:
print('n=' + str(n) + ", " + word)
n = n + 1def find_hash_pi_1(tries):
pi = '31415926535897932384626433832795'
n = 1
for _ in range(tries):
word = ''.join(choice(ascii_lowercase) for _ in range(20))
hash = md5(word.encode('utf-8')).hexdigest()
if hash[:n] == pi[:n]:
n = len(commonprefix([pi, hash])) # note: commonprefix is slow so we only use it if there is a success
print('n=' + str(n) + ", word=" + word + ", hash=" + hash)
n = n + 1def find_hash_pi_2(tries):
pi = '31415926535897932384626433832795'
n = 1
letters = bytes(ascii_lowercase, 'utf-8')
for word in islice(product(letters, repeat=20), tries):
word = bytes(word)
hash = md5(word).hexdigest()
if hash[:n] == pi[:n]:
n = len(commonprefix([pi, hash])) # note: commonprefix is slow so we only use it if we have a hit
print('n=' + str(n) + ", word=" + word.decode('utf-8') + ", hash=" + hash)
n += 1if __name__ == "__main__":
print(timeit('find_hash_pi_external(2 ** 16)', setup="from __main__ import find_hash_pi_external", number=1))
print(timeit('find_hash_pi_1(2 ** 16)', setup="from __main__ import find_hash_pi_1", number=1))
print(timeit('find_hash_pi_2(2 ** 16)', setup="from __main__ import find_hash_pi_2", number=1))n=1, iaathglhsnzablgilhim
n=2, vjvcpilnwymbckhubdxl
n=3, gteyvjexqwahhbgasvco
n=4, rhjccvcubaeqfurjbfgi
1.227025089520841
n=1, word=wanforjhvqrpgnbbvlap, hash=3feca550be69a201c6147a0a20dbf338
n=2, word=eulvqycsfrgqhotpqsah, hash=3177a30c82a3b12856def16568d8b155
n=3, word=knmwruvaahghowwsyqyx, hash=3146373e28453b9dc7d1bb9961917dec
n=4, word=qrsgbfrcwrzwlqjgmryp, hash=31414f71ee6ba9516c9d0acb16e67662
1.239360037012343
n=1, word=aaaaaaaaaaaaaaaaaabp, hash=3f6056bb268f4176685134ae56a838da
n=2, word=aaaaaaaaaaaaaaaaaaqt, hash=311468ef9220551699d61e9b70e43aa7
n=4, word=aaaaaaaaaaaaaaaaafml, hash=314125241241d57b8085afe08b50bdf0
0.10168091144473257Context
StackExchange Code Review Q#127505, answer score: 3
Revisions (0)
No revisions yet.