HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Finding a string whose MD5 starts like the digits of π

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
startsthewhosedigitsmd5likefindingstring

Problem

I tried to make a program to find a string whose MD5-hash starts like the digits of pi, where dot is omitted. Is there a faster way to do it than this:

import hashlib
import random
import string
n = 1
while True:
    word = ''.join(random.choice(string.ascii_lowercase) for _ in range(20))
    if (str(hashlib.md5(word.encode('utf-8')).hexdigest()))[:n] == '31415926535897932384626433832795'[0:n]:
        print('n='+str(n)+", "+word)
        n = n + 1

Solution

The algorithm you're using is a brute-force algorithm. It will require around (24)n hash checks for a match of n digits. Each additional digit will require an average of 16-1 more time to calculate than the previous one.

The first, and hopefully obvious, step would be to get a better algorithm for a huge speedup - unfortunately, this is non-trivial in this case.

An additional step could be to use a language that is, typically, faster than Python. You probably want a compiled language like C/C++, or even Java.

Lastly, this is a perfect parallel computing problem. You have lots of small pieces (i.e. pick some text and hash it) and each is distinct, so they're easy to do in parallel.

Here I've given some improvements for a Python program, brute-force algorithm, singly-threaded.

The most important aspect of the code is how quickly we can create the word and calculate the md5 hash. I've rewritten your code to not use an infinite loop, but instead loop for a specific number of tries. This lets us time things using timeit.

from hashlib import md5
from random import choice, shuffle
from string import ascii_lowercase
from timeit import timeit
from itertools import product, islice
from os.path import commonprefix

def find_hash_pi_external(tries):
    n = 1
    for _ in range(tries):
        word = ''.join(choice(ascii_lowercase) for _ in range(20))
        if (str(md5(word.encode('utf-8')).hexdigest()))[:n] == '31415926535897932384626433832795'[0:n]:
            print('n=' + str(n) + ", " + word)
            n = n + 1


Some other comments:

  • The method hexdigest() returns a string, so you don't need to call str() there.



  • You can exclude the 0 in a[0:n].



  • You don't always have to increment n by one - sometimes you've found one that will match more than n characters.



def find_hash_pi_1(tries):
    pi = '31415926535897932384626433832795'
    n = 1
    for _ in range(tries):
        word = ''.join(choice(ascii_lowercase) for _ in range(20))
        hash = md5(word.encode('utf-8')).hexdigest()
        if hash[:n] == pi[:n]:
            n = len(commonprefix([pi, hash])) # note: commonprefix is slow so we only use it if there is a success
            print('n=' + str(n) + ", word=" + word + ", hash=" + hash)
            n = n + 1


The code above calls choice many times to make each word random. We don't need to do that - instead, we need only different word value in the each step of the loop. It also creates the word as a string, but the md5 module requires bytes. It is expensive to convert, so we can try to create the word as bytes without the string intermediate.

def find_hash_pi_2(tries):
    pi = '31415926535897932384626433832795'
    n = 1
    letters = bytes(ascii_lowercase, 'utf-8')
    for word in islice(product(letters, repeat=20), tries):
        word = bytes(word)
        hash = md5(word).hexdigest()
        if hash[:n] == pi[:n]:
            n = len(commonprefix([pi, hash])) # note: commonprefix is slow so we only use it if we have a hit
            print('n=' + str(n) + ", word=" + word.decode('utf-8') + ", hash=" + hash)
            n += 1


At this point the profiler is telling me that the built-in method _hashlib.openssl_md5 takes up about 10% of our time. There is still space to improve, but our brute-force algorithm that uses separate md5() calls will be difficult to improve by much more.

Below I've shown the timeit output for all three solutions

if __name__ == "__main__":
    print(timeit('find_hash_pi_external(2 ** 16)', setup="from __main__ import find_hash_pi_external", number=1))
    print(timeit('find_hash_pi_1(2 ** 16)', setup="from __main__ import find_hash_pi_1", number=1))
    print(timeit('find_hash_pi_2(2 ** 16)', setup="from __main__ import find_hash_pi_2", number=1))


Results:

n=1, iaathglhsnzablgilhim
n=2, vjvcpilnwymbckhubdxl
n=3, gteyvjexqwahhbgasvco
n=4, rhjccvcubaeqfurjbfgi
1.227025089520841
n=1, word=wanforjhvqrpgnbbvlap, hash=3feca550be69a201c6147a0a20dbf338
n=2, word=eulvqycsfrgqhotpqsah, hash=3177a30c82a3b12856def16568d8b155
n=3, word=knmwruvaahghowwsyqyx, hash=3146373e28453b9dc7d1bb9961917dec
n=4, word=qrsgbfrcwrzwlqjgmryp, hash=31414f71ee6ba9516c9d0acb16e67662
1.239360037012343
n=1, word=aaaaaaaaaaaaaaaaaabp, hash=3f6056bb268f4176685134ae56a838da
n=2, word=aaaaaaaaaaaaaaaaaaqt, hash=311468ef9220551699d61e9b70e43aa7
n=4, word=aaaaaaaaaaaaaaaaafml, hash=314125241241d57b8085afe08b50bdf0
0.10168091144473257

Code Snippets

from hashlib import md5
from random import choice, shuffle
from string import ascii_lowercase
from timeit import timeit
from itertools import product, islice
from os.path import commonprefix

def find_hash_pi_external(tries):
    n = 1
    for _ in range(tries):
        word = ''.join(choice(ascii_lowercase) for _ in range(20))
        if (str(md5(word.encode('utf-8')).hexdigest()))[:n] == '31415926535897932384626433832795'[0:n]:
            print('n=' + str(n) + ", " + word)
            n = n + 1
def find_hash_pi_1(tries):
    pi = '31415926535897932384626433832795'
    n = 1
    for _ in range(tries):
        word = ''.join(choice(ascii_lowercase) for _ in range(20))
        hash = md5(word.encode('utf-8')).hexdigest()
        if hash[:n] == pi[:n]:
            n = len(commonprefix([pi, hash])) # note: commonprefix is slow so we only use it if there is a success
            print('n=' + str(n) + ", word=" + word + ", hash=" + hash)
            n = n + 1
def find_hash_pi_2(tries):
    pi = '31415926535897932384626433832795'
    n = 1
    letters = bytes(ascii_lowercase, 'utf-8')
    for word in islice(product(letters, repeat=20), tries):
        word = bytes(word)
        hash = md5(word).hexdigest()
        if hash[:n] == pi[:n]:
            n = len(commonprefix([pi, hash])) # note: commonprefix is slow so we only use it if we have a hit
            print('n=' + str(n) + ", word=" + word.decode('utf-8') + ", hash=" + hash)
            n += 1
if __name__ == "__main__":
    print(timeit('find_hash_pi_external(2 ** 16)', setup="from __main__ import find_hash_pi_external", number=1))
    print(timeit('find_hash_pi_1(2 ** 16)', setup="from __main__ import find_hash_pi_1", number=1))
    print(timeit('find_hash_pi_2(2 ** 16)', setup="from __main__ import find_hash_pi_2", number=1))
n=1, iaathglhsnzablgilhim
n=2, vjvcpilnwymbckhubdxl
n=3, gteyvjexqwahhbgasvco
n=4, rhjccvcubaeqfurjbfgi
1.227025089520841
n=1, word=wanforjhvqrpgnbbvlap, hash=3feca550be69a201c6147a0a20dbf338
n=2, word=eulvqycsfrgqhotpqsah, hash=3177a30c82a3b12856def16568d8b155
n=3, word=knmwruvaahghowwsyqyx, hash=3146373e28453b9dc7d1bb9961917dec
n=4, word=qrsgbfrcwrzwlqjgmryp, hash=31414f71ee6ba9516c9d0acb16e67662
1.239360037012343
n=1, word=aaaaaaaaaaaaaaaaaabp, hash=3f6056bb268f4176685134ae56a838da
n=2, word=aaaaaaaaaaaaaaaaaaqt, hash=311468ef9220551699d61e9b70e43aa7
n=4, word=aaaaaaaaaaaaaaaaafml, hash=314125241241d57b8085afe08b50bdf0
0.10168091144473257

Context

StackExchange Code Review Q#127505, answer score: 3

Revisions (0)

No revisions yet.