patternpythonMinor

Optimizing critical loop for consuming a byte-buffer

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

criticalconsumingbyteloopoptimizingforbuffer

Problem

I'm currently developing an Open Source project called sims3py using Python2.7. One overarching goal of the project is to ensure a Pure Python implementation of functions currently available only through a .Net-based library.

Because the project must be Pure Python, 'cheats' using C code is not allowed (except for the implicit C code invoked by standard Python libraries).

I have performed profiling on my code, and discovered that a certain function, decompress() was very slow. This function is called thousands of time by many tools I created based on the sims3py library. So naturally this is a right place to perform optimization.

The decompression algorithm is documented here.

(Please note that I've performed profiling; the times listed below are time taken only for the decompression part. More specifically, time taken to perform chunk = decompress(buff) from another place in the program. They are not the total time taken for the program to run. Just for the decompress() function).

The first iteration uses a custom class called SlidingUnpacker. It works, but it's darned slow. When this first iteration was used against a test file (containing about 6000 chunks to decompress), the decompression of elements in that file took 240+ seconds.

The second iteration, which is the current Public version, dispenses with SlidingUnpacker and accesses the blob to decompress directly (via memoryview()), and it's much faster; testing against the same test file shows that the second iteration took about 75 seconds.

After reviewing how the code works, I've optimized decompress() further as follows:

```
def decompress(byte_buffer, strict_size=True, ignore_extra=True):
"""
Performs a decompression on a memoryview(byte_buffer).

:param byte_buffer: byte_buffer containing the compressed resource
:type byte_buffer: memoryview
:param strict_size: Whether to raise exception on decompressed size mismatch or not
:type strict_size: bool
:param i

Solution

The lazy thing to note is that pypy runs this in about 20% of the time, so PyPy should be preferred if possible.

Your with io.BytesIO() as output: can safely cover a smaller fraction of the code, so I suggest moving as much as possible (within reason) out of its context.

You have

except IndexError:
        ... # stuff
        pass

    finally:
        pass

This should just be

except IndexError:
        ... # stuff

I don't get your justification for running sys.exc_clear. I suggest you make sure this is really the right thing to do, because it looks wrong.

You have

while True:
            if output_len >= fullsize and ignore_extra:
                break

This looks like it would better be written

while not (output_len >= fullsize and ignore_extra):

It seems to me that your while loop:

while num_copy:
    output.seek(copy_pos)
    to_copy = output.read(num_copy)
    le = len(to_copy)
    if le:
        output.seek(0, 2)  # Seek to end of stream
        output.write(to_copy)
        copy_pos += le
        num_copy -= le
    else:
        raise IndexError('There should be at least 1 char but got none.')

is not needed:

If the argument is positive, and the underlying raw stream is not interactive, multiple raw reads may be issued to satisfy the byte count (unless EOF is reached first).

A simple assert should be fine.

Your output.seek(0, 2) should be output.seek(0, io.SEEK_END).

You might find things easier if you use a bytearray over a memoryview as you can avoid all of the ord calls.

output can also be a bytearray. I find this gives a significant speed improvement.

You spend a lot of upkeep on output_len; there's no real harm in using len(output), so I suggest you do so.

There seems to be no good reason for this line:

control_0, control_1, control_2, control_3 = 0, 0, 0, 0

so remove it.

It looks to me like the if control_0 is not None: in the except IndexError can be replaced with a pos < len(buf) in the while and the try can be moved in. This does come at a slight speed cost so I avoided the change.

Some of your bit twiddling can be simplified. The ifs:

# 0x00 ~ 0x7F
if control_0 < 0x80:
    ...

# 0x80 ~ 0xBF
elif control_0 < 0xX0:
    ...

# 0xC0 ~ 0xDF
elif control_0 < 0xE0:
    ...

# 0xE0 ~ 0xFB
elif control_0 < 0xFC:
    ...

# 0xFC ~ 0xFF
else:
    ...

Your

num_plain = ((control_1 & 0b11000000) >> 6) & 0b11

can be just

num_plain = (control_1 >> 6) & 0b11

I don't see how to speed it up further, but this should be a 2x improvement when staying on CPython and a 10x improvement if moving to PyPy (an extra 5x from the better interpreter).

def decompress(byte_buffer, strict_size=True, ignore_extra=True):
    """
    Performs a decompression on a bytearray(byte_buffer).

    :param byte_buffer: byte_buffer containing the compressed resource
    :type byte_buffer: bytearray
    :param strict_size: Whether to raise exception on decompressed size mismatch or not
    :type strict_size: bool
    :param ignore_extra: Whether to ignore compressed bytes beyond specified fullsize
    :type ignore_extra: bool
    :return: a bytes() containing the uncompressed resource
    :rtype: bytes
    """
    assert isinstance(byte_buffer, bytearray)

    buf = byte_buffer

    comptype, magic = buf[0:2]
    if magic != 0xFB:
        # Not a valid compression format
        raise InvalidCompressionException(message='NoMagic, the Magic byte 0xFB not found!')

    s1 = 0
    if comptype & 0x80:
        s1, s2, s3, s4 = buf[2:6]
        pos = 6
    else:
        s2, s3, s4 = buf[2:5]
        pos = 5
    fullsize = (s1 = fullsize and ignore_extra):
            # The while ensures that buf[pos] always valid
            control_0 = None
            control_0 = buf[pos]

            pos += 1

            # 0x00 ~ 0x7F
            if control_0 > 2) & 0b111) + 3
                copy_offset = ((control_0 & 0b1100000) > 6) & 0b11
                num_copy = (control_0 & 0b111111) + 4
                copy_offset = ((control_1 & 0b111111) = fullsize (while ignore_extra == True)
    #     (2) End of compression control detected (0xFC ~ 0xFF)
    #     (3) byte_buffer has been exhausted before (1) or (2) reached
    # In any case, all compression controls have been decoded properly (i.e., no incomplete control codes and/or
    # truncated data needed by control codes). So, technically the compressed data was NOT corrupt.
    # What we do depends on whether strict flag is set or not.
    if strict_size and fullsize != len(output):
        raise InvalidCompressionException(message='Size mismatch, want {} got {}'.format(fullsize, len(output)))

    return output

One last thing that seems to help is removing the control_1, control_2 and control_3 intermediates:

```
# 0x00 ~ 0x7F
if control_0 > 2) & 0b111) + 3
copy_offset = ((control_0 & 0b1100000) > 6) & 0b11
num_copy = (control_0 & 0b

Code Snippets

except IndexError:
        ... # stuff
        pass

    finally:
        pass

except IndexError:
        ... # stuff

while True:
            if output_len >= fullsize and ignore_extra:
                break

while not (output_len >= fullsize and ignore_extra):

while num_copy:
    output.seek(copy_pos)
    to_copy = output.read(num_copy)
    le = len(to_copy)
    if le:
        output.seek(0, 2)  # Seek to end of stream
        output.write(to_copy)
        copy_pos += le
        num_copy -= le
    else:
        raise IndexError('There should be at least 1 char but got none.')

Context

StackExchange Code Review Q#82094, answer score: 4

Revisions (0)

No revisions yet.