HiveBrain v1.2.0
Get Started
← Back to all entries
snippetpythonMinor

Parse WAP-230 "variable length unsigned integers"

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
lengthwapunsignedparse230integersvariable

Problem

I am working on a program to decode an MMS PDU file. In the file, the "content-length" is represented in a unique way. According to the MMS spec WAP-230 Section 8.1.2, it is encoded as a "Variable Length Unsigned Integer".

Basically for each byte, the 1st bit is the "continue bit" and the other 7 are the "payload". We keep reading bytes while the "continue bit" is 1. When it's 0, we take the "payloads" and combine them together into a value.

Here's an example, let's say I have the following bytes:

82 E5 04


or in binary:

1000 0010 1110 0101 0000 0100


Then we split them into their continue bit/payload:

1 | 0000010
1 | 1100101
0 | 0000100


Now, we start from the beginning, append the bits and move on until the continue bit is 0. Thus we get the value:

000001011001010000100


or broken into bytes (and left padded with zeroes):

0000 0000 1011 0010 1000 0100


This can be read (in hex) as:

00 B2 84


which converts to 45700 (0xB284).

I tried to implement this in python as I was reading through an MMS PDU file byte by byte. Here is what I came up with:

cont_bit = True
remaining_bits = []

while cont_bit:
    variable_length = self.data[curr_index]
    curr_index += 1

    # There's obviously a better way to do this, but I don't really know what it is
    binary_length = bin(variable_length).lstrip('0b').zfill(8)

    # Check the "continue bit"
    cont_bit = (binary_length[0] == '1')
    remaining_bits.append(binary_length[1:])

# Put the values together and read it as an int
content_length = int(''.join(remaining_bits), 2)


Note: self.data is the binary file I am reading and curr_index is my current position in the file.

This does work and content_length does contain the right value, I just think there's gotta be a better way to do this than to convert each byte into a string (representing its binary representation), reading the 1st character of this string, then appending the rest of the string into an array (which I pa

Solution

To get the bits out of a byte without converting to a string and back again, use Python's bitwise operations and shift operations. To get the high bit of a byte, shift it right by 7 bits:

>>> data = open('/dev/urandom', 'rb').read(128)
>>> data[0]
193
>>> data[0] >> 7
1


To get the low seven bits of a byte, mask against 0b1111111 (that is, 127):

>>> data[0] & 127
65


Finally, if we maintain a running value for content_length, we can extend it with these seven bits of payload by taking the running value, shifting it left by 7 bits and or-ing with the payload. In summary:

content_length = 0
while True:
    byte = self.data[curr_index]
    curr_index += 1
    content_length = (content_length > 7 == 0:
        break

Code Snippets

>>> data = open('/dev/urandom', 'rb').read(128)
>>> data[0]
193
>>> data[0] >> 7
1
>>> data[0] & 127
65
content_length = 0
while True:
    byte = self.data[curr_index]
    curr_index += 1
    content_length = (content_length << 7) | (byte & 127)
    if byte >> 7 == 0:
        break

Context

StackExchange Code Review Q#142919, answer score: 3

Revisions (0)

No revisions yet.