patternpythonMinor

ASN.1 BER Encoding and Decoding

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

berdecodingencodingandasn

Problem

I have a project (ldaplib) I am working on that needs to do ASN1-BER decoding and encoding. The function I have for the decoding portion is slightly more complex but neither are all that complicated. I would like to get some feedback on the overall approach as well as the code/exceptions/inline comments.

The decoding specifically uses a named tuple defined much higher in the code as a return type which is then used in many places throughout the code:

BER = namedtuple('BER', 'cls pc tag value')

The encoding portion:

def ber_encode(ber_cls, ber_pc, ber_tag, message):
    """ Encode a message into ASN1-BER (TLV) bytes.
    The ber_cls, ber_pc, and ber_tag should be numbers, the message can be
    bytes, str, or int. """
    if type(message) not in (str, int, bytes):
        raise TypeError("Requires str, int, or bytes object.")

    encoded = (ber_cls + ber_pc + ber_tag).to_bytes(1, byteorder='big')

    if type(message) is int:
        bytelength = (message.bit_length() // 8) + 1
        message = message.to_bytes(bytelength, byteorder='big')
    elif type(message) is str:
        message = message.encode(encoding='ascii')

    if len(message) == 0:
        return encoded + int(0).to_bytes(1, byteorder='big')

    # Short form or Long form?
    length = len(message)
    if length < 0x80:
        encoded += length.to_bytes(1, byteorder='big')
    else:
        bytelength = (length.bit_length() // 8) + 1
        encoded += (0x80 + bytelength).to_bytes(1, byteorder='big')
        encoded += length.to_bytes(bytelength, byteorder='big')

    # Add the message
    encoded += message

    return encoded

The decoding portion:

```
def ber_decode(message, result=None):
""" Decode a message from ASN1-BER (TLV) bytes into a list of byte values.
Values that contain embedded types will need to call this function to
break the objects into their individual type/value components. """
if type(message) is not bytes:
raise TypeError("Requ

Solution

Some ideas:

String concatenation

The method you're using for string concatenation (successively using the += operator) is the slowest method in Python for constructing strings. Consider instead creating a list of strings and concatenating them at the end using return ''.join(created_strings). Because your BER encoder is likely to be used with a protocol, speed may be important.

Checking of inputs

Your cls, pc and tag elements are not checked on input to the ber_encode routine and would allow constructs such as ber_encode(0xc3,7,3,"bongo") which gets coded with a tag of 0xcd. That isn't necessarily wrong but it's not what I might have expected.

Orthogonality of ber_encode and ber_decode

It's not unreasonable for a user of your functions to assume that the two complementary functions can each be fed the output of the other, but that's not the case for these functions. While ber_decode will happily digest the output of ber_encode, what ber_decode produces is a named tuple rather than an output that's compatible with ber_encode's input requirements. Again, this isn't necessarily wrong, but it is a potential impediment to users of your functions.

Error checking

Generally, the code does a pretty good job of validating inputs and throwing appropriate errors, but there is at least one case which is accepted that should probably throw an error. Specifically,

ber_decode(b'\x86\xff'+('a'*256).encode(encoding='ascii'))

results in a BER value which is 129 bytes long. It's probably worth double checking ITU X.690 section 8.1.3.5 which says that an encoded length value of 11111111b shall not be used.

Similarly,

ber_decode(b'\x86\x80'+('a'*256).encode(encoding='ascii'))

is decoded as three BER encodings, the first of which is 128 bytes long (which is suspect), the second is 97 bytes long (which is OK) and the third is 27 bytes long, which may or may not be OK, but shouldn't a truncated message throw an exception?

Test vectors

You may simply have omitted them from the posted code, but I'd highly recommend including test vectors with the code. Python has a number of ways to do testing including doctest and unittest. Not only do they help you make sure you've considered many different types of input (both good and bad), but they also serve as documentation for users of the functions.

Code Snippets

ber_decode(b'\x86\xff'+('a'*256).encode(encoding='ascii'))

ber_decode(b'\x86\x80'+('a'*256).encode(encoding='ascii'))

Context

StackExchange Code Review Q#47626, answer score: 4

Revisions (0)

No revisions yet.