HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Python mint hashcash token

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
tokenhashcashpythonmint

Problem

This is a Python program to mint a hashcash token, but my code is a lot slower than using a library. What is slowing my program down? It takes over 10 seconds to mint a 20-bit stamp, but using a library, it takes less than a second.

This is my program:

from os import urandom
from hashlib import sha1
import time

def mint(name, bits):
    t = time.strftime("%y%m%d")
    out = " "
    count = 0
    while out[:bits] != bits*"0":
        RandomData = urandom(8).encode("base64").replace("\n", "")
        data = "1:%d:%s:%s::%s:%s"%(bits, t, name, RandomData, format(count, 'x'))
        out = (''.join(format(ord(x), '08b') for x in sha1(data).digest()))
        count += 1
    return data

print mint("email_address", 20)


This is the library I am using for comparison.

Solution

The first thing to do is to profile your code. Hashcash is really designed to make you perform 2^20 hashes: if you do more work than those hashes, something's wrong!

Profiling

python -m cProfile -s cumtime original.py

Let's look at all the places where the algorithms spends more than 0.001 seconds (I've asked for 15 bits since 20 bits is too slow on my computer) :

Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1414287    4.087    0.000    7.665    0.000 original.py:12()
  1414287    2.433    0.000    2.433    0.000 {format}
   134695    1.737    0.000    9.402    0.000 {method 'join' of 'str' objects}
  1346940    1.295    0.000    1.295    0.000 {ord}
        1    0.886    0.886   12.709   12.709 original.py:5(mint)
    67347    0.536    0.000    0.969    0.000 base64.py:310(encodestring)
    67347    0.432    0.000    0.432    0.000 {posix.urandom}
    67347    0.258    0.000    1.292    0.000 base64_codec.py:13(base64_encode)
    67347    0.231    0.000    1.525    0.000 {method 'encode' of 'str' objects}
    67348    0.141    0.000    0.141    0.000 {_hashlib.openssl_sha1}
    67347    0.131    0.000    0.131    0.000 {method 'digest' of '_hashlib.HASH' objects}
   134694    0.131    0.000    0.131    0.000 {len}
    67347    0.126    0.000    0.126    0.000 {binascii.b2a_base64}
    67348    0.110    0.000    0.110    0.000 {range}
    67347    0.109    0.000    0.109    0.000 {method 'replace' of 'str' objects}
    67347    0.066    0.000    0.066    0.000 {method 'append' of 'list' objects}
        1    0.001    0.001    0.001    0.001 hashlib.py:55()
        1    0.001    0.001    0.001    0.001 base64.py:3()


This specific run took 12.7 seconds (original.py:5). It spent 4 seconds in the for x in ... generator, 2.4 seconds in format(ord(x), '08b'), 1.7 seconds in ''.join(...), and 1.3 seconds in ord(x).

That's a total of... 9.5 seconds out of 12.7 seconds spent in the out = (''.join(format(ord(x), '08b') for x in sha1(data).digest())) line. This would have been impossible to figure out without profiling (as you can see in kyle k's comment). It's the number one rule of optimizations: since you'll spent 80% of your time in 20% of the code, only optimize what you know to be slow!

Removing the 80%

So, how can we replace this line? The idea is to use hexdigest instead of digest. The issue with hexdigest is that we only see hexadecimal digits, not bits. Since 20 bits is 5 hexadecimal numbers, that's not an issue here. If you needed 21 bits, then you would ask for 6 hexadecimal numbers, which is more hashes than necessary, but still worth the speed up.

Here's the new code:

from os import urandom
from hashlib import sha1
import time
from math import ceil

def mint(name, bits):
    t = time.strftime("%y%m%d")
    out = " "
    count = 0
    digits = int(ceil(bits/4.0))

    while out[:digits] != digits*"0":
        RandomData = urandom(8).encode("base64").replace("\n", "")
        RandomData = 'AHZlx9LYMyo='
        data = "1:%d:%s:%s::%s:%s"%(bits, t, name, RandomData, format(count, 'x'))
        out = sha1(data).hexdigest()
        count += 1
    return data

print mint("email_address", 15)


And the new profiling:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.750    0.750    3.212    3.212 improved.py:6(mint)
    67347    0.529    0.000    0.959    0.000 base64.py:310(encodestring)
    67347    0.427    0.000    0.427    0.000 {posix.urandom}
    67347    0.251    0.000    1.274    0.000 base64_codec.py:13(base64_encode)
    67347    0.227    0.000    1.503    0.000 {method 'encode' of 'str' objects}
    67347    0.155    0.000    0.155    0.000 {format}
    67348    0.138    0.000    0.138    0.000 {_hashlib.openssl_sha1}
    67347    0.134    0.000    0.134    0.000 {method 'hexdigest' of '_hashlib.HASH' objects}
   134694    0.131    0.000    0.131    0.000 {len}
    67347    0.122    0.000    0.122    0.000 {binascii.b2a_base64}
    67348    0.109    0.000    0.109    0.000 {range}
    67347    0.105    0.000    0.105    0.000 {method 'replace' of 'str' objects}
    67348    0.066    0.000    0.066    0.000 {method 'join' of 'str' objects}
    67347    0.066    0.000    0.066    0.000 {method 'append' of 'list' objects}
        1    0.001    0.001    0.001    0.001 hashlib.py:55()
        1    0.001    0.001    3.213    3.213 improved.py:1()
        1    0.001    0.001    0.001    0.001 base64.py:3()


Yay! Only 3.2 seconds in mint().

Removing another 80%

But hashing is still not the most expensive part. It's now the computing of RandomData (0.5 + 0.2 seconds in base64 encoding, 0.4 seconds in urandom, and so on). @cHao is right, you only need to compute that RandomData once and find the count that provides the correct sha1 hash. Let's move it out of the while loop:

```
from os import urandom
from hashlib import sha1
import time
from math import ceil

def m

Code Snippets

Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1414287    4.087    0.000    7.665    0.000 original.py:12(<genexpr>)
  1414287    2.433    0.000    2.433    0.000 {format}
   134695    1.737    0.000    9.402    0.000 {method 'join' of 'str' objects}
  1346940    1.295    0.000    1.295    0.000 {ord}
        1    0.886    0.886   12.709   12.709 original.py:5(mint)
    67347    0.536    0.000    0.969    0.000 base64.py:310(encodestring)
    67347    0.432    0.000    0.432    0.000 {posix.urandom}
    67347    0.258    0.000    1.292    0.000 base64_codec.py:13(base64_encode)
    67347    0.231    0.000    1.525    0.000 {method 'encode' of 'str' objects}
    67348    0.141    0.000    0.141    0.000 {_hashlib.openssl_sha1}
    67347    0.131    0.000    0.131    0.000 {method 'digest' of '_hashlib.HASH' objects}
   134694    0.131    0.000    0.131    0.000 {len}
    67347    0.126    0.000    0.126    0.000 {binascii.b2a_base64}
    67348    0.110    0.000    0.110    0.000 {range}
    67347    0.109    0.000    0.109    0.000 {method 'replace' of 'str' objects}
    67347    0.066    0.000    0.066    0.000 {method 'append' of 'list' objects}
        1    0.001    0.001    0.001    0.001 hashlib.py:55(<module>)
        1    0.001    0.001    0.001    0.001 base64.py:3(<module>)
from os import urandom
from hashlib import sha1
import time
from math import ceil

def mint(name, bits):
    t = time.strftime("%y%m%d")
    out = " "
    count = 0
    digits = int(ceil(bits/4.0))

    while out[:digits] != digits*"0":
        RandomData = urandom(8).encode("base64").replace("\n", "")
        RandomData = 'AHZlx9LYMyo='
        data = "1:%d:%s:%s::%s:%s"%(bits, t, name, RandomData, format(count, 'x'))
        out = sha1(data).hexdigest()
        count += 1
    return data

print mint("email_address", 15)
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.750    0.750    3.212    3.212 improved.py:6(mint)
    67347    0.529    0.000    0.959    0.000 base64.py:310(encodestring)
    67347    0.427    0.000    0.427    0.000 {posix.urandom}
    67347    0.251    0.000    1.274    0.000 base64_codec.py:13(base64_encode)
    67347    0.227    0.000    1.503    0.000 {method 'encode' of 'str' objects}
    67347    0.155    0.000    0.155    0.000 {format}
    67348    0.138    0.000    0.138    0.000 {_hashlib.openssl_sha1}
    67347    0.134    0.000    0.134    0.000 {method 'hexdigest' of '_hashlib.HASH' objects}
   134694    0.131    0.000    0.131    0.000 {len}
    67347    0.122    0.000    0.122    0.000 {binascii.b2a_base64}
    67348    0.109    0.000    0.109    0.000 {range}
    67347    0.105    0.000    0.105    0.000 {method 'replace' of 'str' objects}
    67348    0.066    0.000    0.066    0.000 {method 'join' of 'str' objects}
    67347    0.066    0.000    0.066    0.000 {method 'append' of 'list' objects}
        1    0.001    0.001    0.001    0.001 hashlib.py:55(<module>)
        1    0.001    0.001    3.213    3.213 improved.py:1(<module>)
        1    0.001    0.001    0.001    0.001 base64.py:3(<module>)
from os import urandom
from hashlib import sha1
import time
from math import ceil

def mint(name, bits):
    t = time.strftime("%y%m%d")
    out = " "
    count = 0
    digits = int(ceil(bits/4.0))
    RandomData = urandom(8).encode("base64").replace("\n", "") 
    RandomData = 'AHZlx9LYMyo='

    while out[:digits] != digits*"0":
        data = "1:%d:%s:%s::%s:%s"%(bits, t, name, RandomData, format(count, 'x'))
        out = sha1(data).hexdigest()
        count += 1
    return data

print mint("email_address", 15)
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.414    0.414    0.795    0.795 improved.py:6(mint)
    67347    0.134    0.000    0.134    0.000 {format}
    67348    0.124    0.000    0.124    0.000 {_hashlib.openssl_sha1}
    67347    0.121    0.000    0.121    0.000 {method 'hexdigest' of '_hashlib.HASH' objects}
        1    0.001    0.001    0.001    0.001 hashlib.py:55(<module>)
        1    0.001    0.001    0.796    0.796 improved.py:1(<module>)
        1    0.001    0.001    0.001    0.001 base64.py:3(<module>)

Context

StackExchange Code Review Q#39674, answer score: 7

Revisions (0)

No revisions yet.