patternpythonMinor
Conversion functions between binary, hexadecimal and ASCII
Viewed 0 times
conversionhexadecimalbetweenbinaryasciiandfunctions
Problem
I've written a small suite to easily convert my data between binary, hexadecimal and ASCII:
Example:
How would you optimize those methods?
Also, I realize I'm actually only working with
#!/usr/bin/env python2
# -- coding: utf-8 --
import sys
def to_list(chain, offset):
return [chain[i:i+offset] for i in range(0, len(chain), offset)]
def bin2str(chain):
return ''.join((chr(int(chain[i:i+8], 2)) for i in range(0, len(chain), 8)))
def bin2hex(chain):
return ''.join((hex(int(chain[i:i+8], 2))[2:] for i in range(0, len(chain), 8)))
def str2bin(chain):
return ''.join((bin(ord(c))[2:].zfill(8) for c in chain))
def str2hex(chain):
return ''.join((hex(ord(c))[2:] for c in chain))
def hex2bin(chain):
return ''.join((bin(int(chain[i:i+2], 16))[2:].zfill(8) for i in range(0, len(chain), 2)))
def hex2str(chain):
return ''.join((chr(int(chain[i:i+2], 16)) for i in range(0, len(chain), 2)))
if __name__ == '__main__':
word = sys.argv[1]
word_bin = str2bin(word)
word_hex = str2hex(word)
# tests
assert bin2str(word_bin) == word
assert bin2hex(word_bin) == word_hex
assert hex2str(word_hex) == word
assert hex2bin(word_hex) == word_bin
# output
print "bin:", to_list(word_bin, 8)
print "hex:", to_list(word_hex, 2)
Example:
$ ./utils.py test
bin: ['01110100', '01100101', '01110011', '01110100']
hex: ['74', '65', '73', '74']How would you optimize those methods?
Also, I realize I'm actually only working with
str representations of the different formats. What would you do to get real binary or hexadecimal results from those functions?Solution
Step 1: minor cleanup
You've defined a
Instead of slicing off the
I've rearranged the order of appearance, grouping by the target type rather than the source type, as a segue to the next step…
Step 2: decluttering the interface
Looking at the code above, you'll notice a pattern: each function just mixes-and-matches code. There is always a decoding portion (either
A more disturbing observation is that the complexity is exponential. If you wanted to add octal support, for example, you would need to add
A better design would be to convert everything to/from a common interface. I've chosen an iterable of ASCII values as that hub.
Then, instead of writing
You happen to be at the break-even point: either way, there are six functions. The architecture proposed in Step 2 simplifies your library (at a slight burden to the caller). You would see a payoff, though, if you were to add octal support — you would be adding two functions instead of six.
You've defined a
to_list() function, but aren't really taking advantage of it. Instead, you're still writing things like … for i in range(0, len(chain), 8)) in functions like bin2str. You should change to_list() into a generator. (Then, since it is no longer returning a list, you should also rename it. You'll end up with the same thing as the chunks() function in this Stack Overflow answer.)chain is an odd parameter name. It strikes me as a botched English translation from a Romance language ("chaîne", "catena", etc.). If we want to help programmers who aren't English speakers, I suggest replacing your …2… names with _to_.Instead of slicing off the
0b and 0x prefixes generated by the bin() and hex() functions, and zero-padding using zfill(), use str.format() to take care of both problems.# http://stackoverflow.com/a/312464/1157100
def _chunks(str, chunk_size):
for i in xrange(0, len(str), chunk_size):
yield str[i:i+chunk_size]
def hex_to_bin(hex):
return ''.join('{:08b}'.format(int(x, 16)) for x in _chunks(hex, 2))
def str_to_bin(str):
return ''.join('{:08b}'.format(ord(c)) for c in str)
def bin_to_hex(bin):
return ''.join('{:02x}'.format(int(b, 2)) for b in _chunks(bin, 8))
def str_to_hex(str):
return ''.join('{:02x}'.format(ord(c)) for c in str)
def bin_to_str(bin):
return ''.join(chr(int(b, 2)) for b in _chunks(bin, 8))
def hex_to_str(hex):
return ''.join(chr(int(x, 16)) for x in _chunks(hex, 2))I've rearranged the order of appearance, grouping by the target type rather than the source type, as a segue to the next step…
Step 2: decluttering the interface
Looking at the code above, you'll notice a pattern: each function just mixes-and-matches code. There is always a decoding portion (either
… for something in _chunks(input, base) or … for c in str) and an encoding portion (either ''.join('{:format}'.format(something) …) or ''.join(chr(int(something, base)) …)).A more disturbing observation is that the complexity is exponential. If you wanted to add octal support, for example, you would need to add
str_to_oct(), oct_to_str(), bin_to_oct(), oct_to_bin(), hex_to_oct(), and oct_to_hex() — doubling the number of functions.A better design would be to convert everything to/from a common interface. I've chosen an iterable of ASCII values as that hub.
def _chunks(str, chunk_size):
for i in xrange(0, len(str), chunk_size):
yield str[i:i+chunk_size]
def from_str(str):
for c in str:
yield ord(c)
def to_str(ascii):
return ''.join(chr(a) for a in ascii)
def from_bin(bin):
for chunk in _chunks(bin, 8):
yield int(chunk, 2)
def to_bin(ascii):
return ''.join('{:08b}'.format(a) for a in ascii)
def from_hex(hex):
for chunk in _chunks(hex, 2):
yield int(chunk, 16)
def to_hex(ascii):
return ''.join('{:02x}'.format(a) for a in ascii)Then, instead of writing
str2bin(word), you would write to_bin(from_str(word)).You happen to be at the break-even point: either way, there are six functions. The architecture proposed in Step 2 simplifies your library (at a slight burden to the caller). You would see a payoff, though, if you were to add octal support — you would be adding two functions instead of six.
Code Snippets
# http://stackoverflow.com/a/312464/1157100
def _chunks(str, chunk_size):
for i in xrange(0, len(str), chunk_size):
yield str[i:i+chunk_size]
def hex_to_bin(hex):
return ''.join('{:08b}'.format(int(x, 16)) for x in _chunks(hex, 2))
def str_to_bin(str):
return ''.join('{:08b}'.format(ord(c)) for c in str)
def bin_to_hex(bin):
return ''.join('{:02x}'.format(int(b, 2)) for b in _chunks(bin, 8))
def str_to_hex(str):
return ''.join('{:02x}'.format(ord(c)) for c in str)
def bin_to_str(bin):
return ''.join(chr(int(b, 2)) for b in _chunks(bin, 8))
def hex_to_str(hex):
return ''.join(chr(int(x, 16)) for x in _chunks(hex, 2))def _chunks(str, chunk_size):
for i in xrange(0, len(str), chunk_size):
yield str[i:i+chunk_size]
def from_str(str):
for c in str:
yield ord(c)
def to_str(ascii):
return ''.join(chr(a) for a in ascii)
def from_bin(bin):
for chunk in _chunks(bin, 8):
yield int(chunk, 2)
def to_bin(ascii):
return ''.join('{:08b}'.format(a) for a in ascii)
def from_hex(hex):
for chunk in _chunks(hex, 2):
yield int(chunk, 16)
def to_hex(ascii):
return ''.join('{:02x}'.format(a) for a in ascii)Context
StackExchange Code Review Q#85079, answer score: 4
Revisions (0)
No revisions yet.