HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Python PBKDF2 using core modules

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
modulescoreusingpythonpbkdf2

Problem

I'm trying to implement pbkdf2 in Python 2.7/3.4 without needing the pbkdf2 module. (I'm also trying to avoid using classes.)

Any insight would be appreciated:

from binascii import hexlify, unhexlify
import hmac, struct, hashlib, sys

is_python2 = True if sys.version_info.major == 2 else False

def pbkdf_two(passwd, salt, iters=2048, keylen=64, digestmod=hashlib.sha512):
    """
    >>> hexlify(pbkdf_two(b'All n-entities must communicate with other n-entities via n-1 entiteeheehees', unhexlify('1234567878563412'), 500, 16, hashlib.sha1))
    '6a8970bf68c92caea84a8df285108586'
    """
    dgsz = digestmod().digest_size if callable(digestmod) else digestmod.digest_size
    if keylen is None: keylen = dgsz
    # Helper function which copies each iteration for h, where h is an hmac seeded with password
    def pbhelper(h, salt, itercount, blocksize):
        def prf(h, data):
            hm = h.copy()
            hm.update(data)
            return hm.digest()
        U = prf(h, salt + struct.pack('>i', blocksize))
        T = U
        for j in range(2, itercount+1):
            U = prf(h, U)
            T = "".join([chr( ord(x) ^ ord(y) ) for (x, y) in zip( T, U )]) \
                  if is_python2 else bytes([x ^ y for (x, y) in zip(T, U)])    # XORing
        return T
    L = int(keylen/dgsz) # L - number of output blocks to produce
    if keylen % dgsz != 0: L += 1
    h = hmac.new(key=passwd, msg=None, digestmod=digestmod )
    T = b""
    for i in range(1, L+1):
        T += pbhelper(h, salt, iters, i)
    return T[:keylen]

Solution


  1. Review



-
The docstring doesn't explain what the function does, or how to call it, or what it returns. There's just a doctest.

-
If the code were reformatted to fit in 79 columns, as recommended by the Python style guide (PEP8), then we wouldn't have to scroll it horizontally to read it here at Code Review.

-
hexlify and unhexlify are only used by the doctest, so they could be included there.

-
sys is not used, so the import could be omitted.

-
The key derivation function is called PBKDF2, so I think the function would be better named pbkdf2, not pbkdf_two.

-
I don't think that having default values for arguments to this function is a good idea. The caller needs to think about these values, not rely on the defaults. Note that RFC 2898 doesn't specify any particular values.

-
We're not running out of letters, so why not password instead of passwd and digest_size instead of dgsz?

-
The helper function pbhelper is called from only one place in the code, so there's nothing to be gained by making it into a local function.

-
The fourth argument to pbhelper is named blocksize, but this is very misleading: it's actually the (1-based) index of the block.

-
The iteration variable j is not used in the body of the loop. It's conventional to name such a variable _.

-
Since j is not used, it doesn't matter what values it takes, and so it's simpler to use range(iters-1) instead of range(2, iters+1).

-
Since h is always the same, there's no need for it to be a parameter to the prf function.

-
The msg argument to hmac.new defaults to None so there's no need to specify it.

-
If the code used bytearray instead of bytes then it would be portable between Python 2 and 3 without needing a version test.

-
Instead of:

L = int(keylen/dgsz) # L - number of output blocks to produce
 if keylen % dgsz != 0: L += 1


the number of blocks can be computed like this, using the floor division operator //:

L = (keylen + dgsz - 1) // dgsz


  • But even more simply, why not just iterate until the result is long enough? That way you wouldn't have to compute digest_size, and it would have the advantage that in Python 3.4 or later, the caller could pass in the name of the digest algorithm, just as for hmac.new.



  1. Revised code



import hmac
import struct

def pbkdf2(password, salt, iters, keylen, digestmod):
    """Run the PBKDF2 (Password-Based Key Derivation Function 2) algorithm
    and return the derived key. The arguments are:

    password (bytes or bytearray) -- the input password
    salt (bytes or bytearray) -- a cryptographic salt
    iters (int) -- number of iterations
    keylen (int) -- length of key to derive
    digestmod -- a cryptographic hash function: either a module
        supporting PEP 247, a hashlib constructor, or (in Python 3.4
        or later) the name of a hash function.

    For example:

    >>> import hashlib
    >>> from binascii import hexlify, unhexlify
    >>> password = b'Squeamish Ossifrage'
    >>> salt = unhexlify(b'1234567878563412')
    >>> hexlify(pbkdf2(password, salt, 500, 16, hashlib.sha1))
    b'9e8f1072bdf5ef042bd988c7da83e43b'

    """
    h = hmac.new(password, digestmod=digestmod)
    def prf(data):
        hm = h.copy()
        hm.update(data)
        return bytearray(hm.digest())

    key = bytearray()
    i = 1
    while len(key) i', i))
        for _ in range(iters - 1):
            U = prf(U)
            T = bytearray(x ^ y for x, y in zip(T, U))
        key += T
        i += 1

    return key[:keylen]

Code Snippets

L = int(keylen/dgsz) # L - number of output blocks to produce
 if keylen % dgsz != 0: L += 1
L = (keylen + dgsz - 1) // dgsz
import hmac
import struct

def pbkdf2(password, salt, iters, keylen, digestmod):
    """Run the PBKDF2 (Password-Based Key Derivation Function 2) algorithm
    and return the derived key. The arguments are:

    password (bytes or bytearray) -- the input password
    salt (bytes or bytearray) -- a cryptographic salt
    iters (int) -- number of iterations
    keylen (int) -- length of key to derive
    digestmod -- a cryptographic hash function: either a module
        supporting PEP 247, a hashlib constructor, or (in Python 3.4
        or later) the name of a hash function.

    For example:

    >>> import hashlib
    >>> from binascii import hexlify, unhexlify
    >>> password = b'Squeamish Ossifrage'
    >>> salt = unhexlify(b'1234567878563412')
    >>> hexlify(pbkdf2(password, salt, 500, 16, hashlib.sha1))
    b'9e8f1072bdf5ef042bd988c7da83e43b'

    """
    h = hmac.new(password, digestmod=digestmod)
    def prf(data):
        hm = h.copy()
        hm.update(data)
        return bytearray(hm.digest())

    key = bytearray()
    i = 1
    while len(key) < keylen:
        T = U = prf(salt + struct.pack('>i', i))
        for _ in range(iters - 1):
            U = prf(U)
            T = bytearray(x ^ y for x, y in zip(T, U))
        key += T
        i += 1

    return key[:keylen]

Context

StackExchange Code Review Q#87538, answer score: 5

Revisions (0)

No revisions yet.