snippetpythonMinor
Parse WAP-230 "variable length unsigned integers"
Viewed 0 times
lengthwapunsignedparse230integersvariable
Problem
I am working on a program to decode an MMS PDU file. In the file, the "content-length" is represented in a unique way. According to the MMS spec WAP-230 Section 8.1.2, it is encoded as a "Variable Length Unsigned Integer".
Basically for each byte, the 1st bit is the "continue bit" and the other 7 are the "payload". We keep reading bytes while the "continue bit" is 1. When it's 0, we take the "payloads" and combine them together into a value.
Here's an example, let's say I have the following bytes:
or in binary:
Then we split them into their continue bit/payload:
Now, we start from the beginning, append the bits and move on until the continue bit is 0. Thus we get the value:
or broken into bytes (and left padded with zeroes):
This can be read (in hex) as:
which converts to
I tried to implement this in python as I was reading through an MMS PDU file byte by byte. Here is what I came up with:
Note:
This does work and
Basically for each byte, the 1st bit is the "continue bit" and the other 7 are the "payload". We keep reading bytes while the "continue bit" is 1. When it's 0, we take the "payloads" and combine them together into a value.
Here's an example, let's say I have the following bytes:
82 E5 04
or in binary:
1000 0010 1110 0101 0000 0100
Then we split them into their continue bit/payload:
1 | 0000010
1 | 1100101
0 | 0000100
Now, we start from the beginning, append the bits and move on until the continue bit is 0. Thus we get the value:
000001011001010000100
or broken into bytes (and left padded with zeroes):
0000 0000 1011 0010 1000 0100
This can be read (in hex) as:
00 B2 84
which converts to
45700 (0xB284).I tried to implement this in python as I was reading through an MMS PDU file byte by byte. Here is what I came up with:
cont_bit = True
remaining_bits = []
while cont_bit:
variable_length = self.data[curr_index]
curr_index += 1
# There's obviously a better way to do this, but I don't really know what it is
binary_length = bin(variable_length).lstrip('0b').zfill(8)
# Check the "continue bit"
cont_bit = (binary_length[0] == '1')
remaining_bits.append(binary_length[1:])
# Put the values together and read it as an int
content_length = int(''.join(remaining_bits), 2)Note:
self.data is the binary file I am reading and curr_index is my current position in the file.This does work and
content_length does contain the right value, I just think there's gotta be a better way to do this than to convert each byte into a string (representing its binary representation), reading the 1st character of this string, then appending the rest of the string into an array (which I paSolution
To get the bits out of a byte without converting to a string and back again, use Python's bitwise operations and shift operations. To get the high bit of a byte, shift it right by 7 bits:
To get the low seven bits of a byte, mask against 0b1111111 (that is, 127):
Finally, if we maintain a running value for
>>> data = open('/dev/urandom', 'rb').read(128)
>>> data[0]
193
>>> data[0] >> 7
1To get the low seven bits of a byte, mask against 0b1111111 (that is, 127):
>>> data[0] & 127
65Finally, if we maintain a running value for
content_length, we can extend it with these seven bits of payload by taking the running value, shifting it left by 7 bits and or-ing with the payload. In summary:content_length = 0
while True:
byte = self.data[curr_index]
curr_index += 1
content_length = (content_length > 7 == 0:
breakCode Snippets
>>> data = open('/dev/urandom', 'rb').read(128)
>>> data[0]
193
>>> data[0] >> 7
1>>> data[0] & 127
65content_length = 0
while True:
byte = self.data[curr_index]
curr_index += 1
content_length = (content_length << 7) | (byte & 127)
if byte >> 7 == 0:
breakContext
StackExchange Code Review Q#142919, answer score: 3
Revisions (0)
No revisions yet.