patternpythonMinor
Optimizing critical loop for consuming a byte-buffer
Viewed 0 times
criticalconsumingbyteloopoptimizingforbuffer
Problem
I'm currently developing an Open Source project called sims3py using Python2.7. One overarching goal of the project is to ensure a Pure Python implementation of functions currently available only through a .Net-based library.
Because the project must be Pure Python, 'cheats' using C code is not allowed (except for the implicit C code invoked by standard Python libraries).
I have performed profiling on my code, and discovered that a certain function,
The decompression algorithm is documented here.
(Please note that I've performed profiling; the times listed below are time taken only for the decompression part. More specifically, time taken to perform
The first iteration uses a custom class called
The second iteration, which is the current Public version, dispenses with
After reviewing how the code works, I've optimized
```
def decompress(byte_buffer, strict_size=True, ignore_extra=True):
"""
Performs a decompression on a memoryview(byte_buffer).
:param byte_buffer: byte_buffer containing the compressed resource
:type byte_buffer: memoryview
:param strict_size: Whether to raise exception on decompressed size mismatch or not
:type strict_size: bool
:param i
Because the project must be Pure Python, 'cheats' using C code is not allowed (except for the implicit C code invoked by standard Python libraries).
I have performed profiling on my code, and discovered that a certain function,
decompress() was very slow. This function is called thousands of time by many tools I created based on the sims3py library. So naturally this is a right place to perform optimization.The decompression algorithm is documented here.
(Please note that I've performed profiling; the times listed below are time taken only for the decompression part. More specifically, time taken to perform
chunk = decompress(buff) from another place in the program. They are not the total time taken for the program to run. Just for the decompress() function).The first iteration uses a custom class called
SlidingUnpacker. It works, but it's darned slow. When this first iteration was used against a test file (containing about 6000 chunks to decompress), the decompression of elements in that file took 240+ seconds.The second iteration, which is the current Public version, dispenses with
SlidingUnpacker and accesses the blob to decompress directly (via memoryview()), and it's much faster; testing against the same test file shows that the second iteration took about 75 seconds.After reviewing how the code works, I've optimized
decompress() further as follows:```
def decompress(byte_buffer, strict_size=True, ignore_extra=True):
"""
Performs a decompression on a memoryview(byte_buffer).
:param byte_buffer: byte_buffer containing the compressed resource
:type byte_buffer: memoryview
:param strict_size: Whether to raise exception on decompressed size mismatch or not
:type strict_size: bool
:param i
Solution
The lazy thing to note is that
Your
You have
This should just be
I don't get your justification for running
You have
This looks like it would better be written
It seems to me that your
is not needed:
If the argument is positive, and the underlying raw stream is not interactive, multiple raw reads may be issued to satisfy the byte count (unless EOF is reached first).
A simple
Your
You might find things easier if you use a
You spend a lot of upkeep on
There seems to be no good reason for this line:
so remove it.
It looks to me like the
Some of your bit twiddling can be simplified. The
Your
can be just
I don't see how to speed it up further, but this should be a 2x improvement when staying on CPython and a 10x improvement if moving to PyPy (an extra 5x from the better interpreter).
One last thing that seems to help is removing the
```
# 0x00 ~ 0x7F
if control_0 > 2) & 0b111) + 3
copy_offset = ((control_0 & 0b1100000) > 6) & 0b11
num_copy = (control_0 & 0b
pypy runs this in about 20% of the time, so PyPy should be preferred if possible.Your
with io.BytesIO() as output: can safely cover a smaller fraction of the code, so I suggest moving as much as possible (within reason) out of its context.You have
except IndexError:
... # stuff
pass
finally:
passThis should just be
except IndexError:
... # stuffI don't get your justification for running
sys.exc_clear. I suggest you make sure this is really the right thing to do, because it looks wrong.You have
while True:
if output_len >= fullsize and ignore_extra:
breakThis looks like it would better be written
while not (output_len >= fullsize and ignore_extra):It seems to me that your
while loop:while num_copy:
output.seek(copy_pos)
to_copy = output.read(num_copy)
le = len(to_copy)
if le:
output.seek(0, 2) # Seek to end of stream
output.write(to_copy)
copy_pos += le
num_copy -= le
else:
raise IndexError('There should be at least 1 char but got none.')is not needed:
If the argument is positive, and the underlying raw stream is not interactive, multiple raw reads may be issued to satisfy the byte count (unless EOF is reached first).
A simple
assert should be fine.Your
output.seek(0, 2) should be output.seek(0, io.SEEK_END).You might find things easier if you use a
bytearray over a memoryview as you can avoid all of the ord calls.output can also be a bytearray. I find this gives a significant speed improvement.You spend a lot of upkeep on
output_len; there's no real harm in using len(output), so I suggest you do so.There seems to be no good reason for this line:
control_0, control_1, control_2, control_3 = 0, 0, 0, 0so remove it.
It looks to me like the
if control_0 is not None: in the except IndexError can be replaced with a pos < len(buf) in the while and the try can be moved in. This does come at a slight speed cost so I avoided the change.Some of your bit twiddling can be simplified. The
ifs:# 0x00 ~ 0x7F
if control_0 < 0x80:
...
# 0x80 ~ 0xBF
elif control_0 < 0xX0:
...
# 0xC0 ~ 0xDF
elif control_0 < 0xE0:
...
# 0xE0 ~ 0xFB
elif control_0 < 0xFC:
...
# 0xFC ~ 0xFF
else:
...Your
num_plain = ((control_1 & 0b11000000) >> 6) & 0b11can be just
num_plain = (control_1 >> 6) & 0b11I don't see how to speed it up further, but this should be a 2x improvement when staying on CPython and a 10x improvement if moving to PyPy (an extra 5x from the better interpreter).
def decompress(byte_buffer, strict_size=True, ignore_extra=True):
"""
Performs a decompression on a bytearray(byte_buffer).
:param byte_buffer: byte_buffer containing the compressed resource
:type byte_buffer: bytearray
:param strict_size: Whether to raise exception on decompressed size mismatch or not
:type strict_size: bool
:param ignore_extra: Whether to ignore compressed bytes beyond specified fullsize
:type ignore_extra: bool
:return: a bytes() containing the uncompressed resource
:rtype: bytes
"""
assert isinstance(byte_buffer, bytearray)
buf = byte_buffer
comptype, magic = buf[0:2]
if magic != 0xFB:
# Not a valid compression format
raise InvalidCompressionException(message='NoMagic, the Magic byte 0xFB not found!')
s1 = 0
if comptype & 0x80:
s1, s2, s3, s4 = buf[2:6]
pos = 6
else:
s2, s3, s4 = buf[2:5]
pos = 5
fullsize = (s1 = fullsize and ignore_extra):
# The while ensures that buf[pos] always valid
control_0 = None
control_0 = buf[pos]
pos += 1
# 0x00 ~ 0x7F
if control_0 > 2) & 0b111) + 3
copy_offset = ((control_0 & 0b1100000) > 6) & 0b11
num_copy = (control_0 & 0b111111) + 4
copy_offset = ((control_1 & 0b111111) = fullsize (while ignore_extra == True)
# (2) End of compression control detected (0xFC ~ 0xFF)
# (3) byte_buffer has been exhausted before (1) or (2) reached
# In any case, all compression controls have been decoded properly (i.e., no incomplete control codes and/or
# truncated data needed by control codes). So, technically the compressed data was NOT corrupt.
# What we do depends on whether strict flag is set or not.
if strict_size and fullsize != len(output):
raise InvalidCompressionException(message='Size mismatch, want {} got {}'.format(fullsize, len(output)))
return outputOne last thing that seems to help is removing the
control_1, control_2 and control_3 intermediates:```
# 0x00 ~ 0x7F
if control_0 > 2) & 0b111) + 3
copy_offset = ((control_0 & 0b1100000) > 6) & 0b11
num_copy = (control_0 & 0b
Code Snippets
except IndexError:
... # stuff
pass
finally:
passexcept IndexError:
... # stuffwhile True:
if output_len >= fullsize and ignore_extra:
breakwhile not (output_len >= fullsize and ignore_extra):while num_copy:
output.seek(copy_pos)
to_copy = output.read(num_copy)
le = len(to_copy)
if le:
output.seek(0, 2) # Seek to end of stream
output.write(to_copy)
copy_pos += le
num_copy -= le
else:
raise IndexError('There should be at least 1 char but got none.')Context
StackExchange Code Review Q#82094, answer score: 4
Revisions (0)
No revisions yet.