patternpythonMinor
ASN.1 BER Encoding and Decoding
Viewed 0 times
berdecodingencodingandasn
Problem
I have a project (ldaplib) I am working on that needs to do ASN1-BER decoding and encoding. The function I have for the decoding portion is slightly more complex but neither are all that complicated. I would like to get some feedback on the overall approach as well as the code/exceptions/inline comments.
The decoding specifically uses a named tuple defined much higher in the code as a return type which is then used in many places throughout the code:
The encoding portion:
The decoding portion:
```
def ber_decode(message, result=None):
""" Decode a message from ASN1-BER (TLV) bytes into a list of byte values.
Values that contain embedded types will need to call this function to
break the objects into their individual type/value components. """
if type(message) is not bytes:
raise TypeError("Requ
The decoding specifically uses a named tuple defined much higher in the code as a return type which is then used in many places throughout the code:
BER = namedtuple('BER', 'cls pc tag value')The encoding portion:
def ber_encode(ber_cls, ber_pc, ber_tag, message):
""" Encode a message into ASN1-BER (TLV) bytes.
The ber_cls, ber_pc, and ber_tag should be numbers, the message can be
bytes, str, or int. """
if type(message) not in (str, int, bytes):
raise TypeError("Requires str, int, or bytes object.")
encoded = (ber_cls + ber_pc + ber_tag).to_bytes(1, byteorder='big')
if type(message) is int:
bytelength = (message.bit_length() // 8) + 1
message = message.to_bytes(bytelength, byteorder='big')
elif type(message) is str:
message = message.encode(encoding='ascii')
if len(message) == 0:
return encoded + int(0).to_bytes(1, byteorder='big')
# Short form or Long form?
length = len(message)
if length < 0x80:
encoded += length.to_bytes(1, byteorder='big')
else:
bytelength = (length.bit_length() // 8) + 1
encoded += (0x80 + bytelength).to_bytes(1, byteorder='big')
encoded += length.to_bytes(bytelength, byteorder='big')
# Add the message
encoded += message
return encodedThe decoding portion:
```
def ber_decode(message, result=None):
""" Decode a message from ASN1-BER (TLV) bytes into a list of byte values.
Values that contain embedded types will need to call this function to
break the objects into their individual type/value components. """
if type(message) is not bytes:
raise TypeError("Requ
Solution
Some ideas:
String concatenation
The method you're using for string concatenation (successively using the
Checking of inputs
Your
Orthogonality of
It's not unreasonable for a user of your functions to assume that the two complementary functions can each be fed the output of the other, but that's not the case for these functions. While
Error checking
Generally, the code does a pretty good job of validating inputs and throwing appropriate errors, but there is at least one case which is accepted that should probably throw an error. Specifically,
results in a BER value which is 129 bytes long. It's probably worth double checking ITU X.690 section 8.1.3.5 which says that an encoded length value of
Similarly,
is decoded as three BER encodings, the first of which is 128 bytes long (which is suspect), the second is 97 bytes long (which is OK) and the third is 27 bytes long, which may or may not be OK, but shouldn't a truncated message throw an exception?
Test vectors
You may simply have omitted them from the posted code, but I'd highly recommend including test vectors with the code. Python has a number of ways to do testing including
String concatenation
The method you're using for string concatenation (successively using the
+= operator) is the slowest method in Python for constructing strings. Consider instead creating a list of strings and concatenating them at the end using return ''.join(created_strings). Because your BER encoder is likely to be used with a protocol, speed may be important.Checking of inputs
Your
cls, pc and tag elements are not checked on input to the ber_encode routine and would allow constructs such as ber_encode(0xc3,7,3,"bongo") which gets coded with a tag of 0xcd. That isn't necessarily wrong but it's not what I might have expected.Orthogonality of
ber_encode and ber_decodeIt's not unreasonable for a user of your functions to assume that the two complementary functions can each be fed the output of the other, but that's not the case for these functions. While
ber_decode will happily digest the output of ber_encode, what ber_decode produces is a named tuple rather than an output that's compatible with ber_encode's input requirements. Again, this isn't necessarily wrong, but it is a potential impediment to users of your functions.Error checking
Generally, the code does a pretty good job of validating inputs and throwing appropriate errors, but there is at least one case which is accepted that should probably throw an error. Specifically,
ber_decode(b'\x86\xff'+('a'*256).encode(encoding='ascii'))results in a BER value which is 129 bytes long. It's probably worth double checking ITU X.690 section 8.1.3.5 which says that an encoded length value of
11111111b shall not be used.Similarly,
ber_decode(b'\x86\x80'+('a'*256).encode(encoding='ascii'))is decoded as three BER encodings, the first of which is 128 bytes long (which is suspect), the second is 97 bytes long (which is OK) and the third is 27 bytes long, which may or may not be OK, but shouldn't a truncated message throw an exception?
Test vectors
You may simply have omitted them from the posted code, but I'd highly recommend including test vectors with the code. Python has a number of ways to do testing including
doctest and unittest. Not only do they help you make sure you've considered many different types of input (both good and bad), but they also serve as documentation for users of the functions.Code Snippets
ber_decode(b'\x86\xff'+('a'*256).encode(encoding='ascii'))ber_decode(b'\x86\x80'+('a'*256).encode(encoding='ascii'))Context
StackExchange Code Review Q#47626, answer score: 4
Revisions (0)
No revisions yet.