patternpythonModerate
Python implementation of SHA1
Viewed 0 times
implementationsha1python
Problem
Here is a implementation of the cryptographic hash function SHA1 written in Python. It does not use any external libraries, only built-in functions. I know that it would be faster to use an external library, but my code is for learning purposes.
I want to know if I am properly implementing the algorithm. Am I taking any unnecessary steps? What changes can I make to speed up the code?
The code properly works. I've compared the result to a trusted implementation.
```
def sha1(data):
bytes = ""
h0 = 0x67452301
h1 = 0xEFCDAB89
h2 = 0x98BADCFE
h3 = 0x10325476
h4 = 0xC3D2E1F0
for n in range(len(data)):
bytes+='{0:08b}'.format(ord(data[n]))
bits = bytes+"1"
pBits = bits
#pad until length equals 448 mod 512
while len(pBits)%512 != 448:
pBits+="0"
#append the original length
pBits+='{0:064b}'.format(len(bits)-1)
def chunks(l, n):
return [l[i:i+n] for i in range(0, len(l), n)]
def rol(n, b):
return ((n > (32 - b))) & 0xffffffff
for c in chunks(pBits, 512):
words = chunks(c, 32)
w = [0]*80
for n in range(0, 16):
w[n] = int(words[n], 2)
for i in range(16, 80):
w[i] = rol((w[i-3] ^ w[i-8] ^ w[i-14] ^ w[i-16]), 1)
a = h0
b = h1
c = h2
d = h3
e = h4
#Main loop
for i in range(0, 80):
if 0 <= i <= 19:
f = (b & c) | ((~b) & d)
k = 0x5A827999
elif 20 <= i <= 39:
f = b ^ c ^ d
k = 0x6ED9EBA1
elif 40 <= i <= 59:
f = (b & c) | (b & d) | (c & d)
k = 0x8F1BBCDC
elif 60 <= i <= 79:
f = b ^ c ^ d
k = 0xCA62C1D6
temp = rol(a, 5) + f + e + k + w[i] & 0xffffffff
e = d
d = c
c = rol(b, 30)
b = a
a = temp
h0 = h0 + a & 0xf
I want to know if I am properly implementing the algorithm. Am I taking any unnecessary steps? What changes can I make to speed up the code?
The code properly works. I've compared the result to a trusted implementation.
```
def sha1(data):
bytes = ""
h0 = 0x67452301
h1 = 0xEFCDAB89
h2 = 0x98BADCFE
h3 = 0x10325476
h4 = 0xC3D2E1F0
for n in range(len(data)):
bytes+='{0:08b}'.format(ord(data[n]))
bits = bytes+"1"
pBits = bits
#pad until length equals 448 mod 512
while len(pBits)%512 != 448:
pBits+="0"
#append the original length
pBits+='{0:064b}'.format(len(bits)-1)
def chunks(l, n):
return [l[i:i+n] for i in range(0, len(l), n)]
def rol(n, b):
return ((n > (32 - b))) & 0xffffffff
for c in chunks(pBits, 512):
words = chunks(c, 32)
w = [0]*80
for n in range(0, 16):
w[n] = int(words[n], 2)
for i in range(16, 80):
w[i] = rol((w[i-3] ^ w[i-8] ^ w[i-14] ^ w[i-16]), 1)
a = h0
b = h1
c = h2
d = h3
e = h4
#Main loop
for i in range(0, 80):
if 0 <= i <= 19:
f = (b & c) | ((~b) & d)
k = 0x5A827999
elif 20 <= i <= 39:
f = b ^ c ^ d
k = 0x6ED9EBA1
elif 40 <= i <= 59:
f = (b & c) | (b & d) | (c & d)
k = 0x8F1BBCDC
elif 60 <= i <= 79:
f = b ^ c ^ d
k = 0xCA62C1D6
temp = rol(a, 5) + f + e + k + w[i] & 0xffffffff
e = d
d = c
c = rol(b, 30)
b = a
a = temp
h0 = h0 + a & 0xf
Solution
I'm no expert on SHA1, and I see mostly small things, but one big thing that gave me pause. As an overall observation, you seem to value readability over memory usage. Since hashes are often used for large files, this seems to be a risky tradeoff, and python does provide good ways to handle this as well. More of this in the third observation below. On to the observations:
-
This one's important:
Note that if the pBits generation was here, checking
-
I'm not a big fan of the `for i in range(0, 80): if 0
pBitsbeing represented as a series of characters for each bit seems unusual for such low level work. I would typically expect to see this as a hexadecimal string, or even a number. Yet except for possible difficulties comparing it to other implementations that favor less space usage, this seems to make things easy to read.
- Filling out
pBitsto lengths congruent to 448 mod 512 could be done in a single step by figuring out how many bits of padding are necessary and doing a singlepBits += "0" * padding; there's no need for a while loop.
-
This one's important:
chunks() should probably be a generator so that you don't have three copies of your data around (one original, one expanded a bit to a byte, and one in chunked strings); correspondingly this could serve out the trailing zero and length bits. Let's examine what this would look like, as 3 copies of large data is really large. Let's remove the sliced copies, creating only one chunk at a time:def iterchunks(bits, chunkbits=512):
for i in range(0, len(bits), chunkbits):
yield bits[i:i + chunkbits]Note that if the pBits generation was here, checking
i + chunkbits against len(bits) and the bytes to bits conversion, you could process a file-like stream instead of a string, taking you further down, from the original 3 copies (with two of them 8 times larger than usual) to not even a full copy of your data in memory at one time.- Back to small ideas. Your bit string could be more literally a bit string, storing
\0and\1characters. This would let you simplify setting upwintow = map(ord, words). (Of course with the appropriate helper, you could do that with the0and1characters of your current string.)
-
I'm not a big fan of the `for i in range(0, 80): if 0
Code Snippets
def iterchunks(bits, chunkbits=512):
for i in range(0, len(bits), chunkbits):
yield bits[i:i + chunkbits]for i in range(0, 19):
f = (b & c) | ((~b) & d)
k = 0x5A827999
a, b, c, d, e = update(a, b, c, d, e, f, k, w[i])
for i in range(20, 39):
f = b ^ c ^ d
k = 0x6ED9EBA1
a, b, c, d, e = update(a, b, c, d, e, f, k, w[i])
...Context
StackExchange Code Review Q#37648, answer score: 11
Revisions (0)
No revisions yet.