patternpythonMinor

List splitting for RLE compression algorithm

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

rlecompressionsplittingalgorithmforlist

Problem

I try to implement a simple RLE compression algorithm as a learning task. In the first step, i need to split my source list to sequences of repeating and non-repeating elements. I've done it, but code seems to be so ugly. Can I simplify it?

src = [1,1,2,3,3,4,2,1,3,3,4,3,3,3,3,3,4,5]

from itertools import izip

isRepeat = False
sequence = []

def out(sequence):
    if sequence:
        print sequence

def testRepeate(a, b, pr):
    global isRepeat
    global sequence
    if a == b:
        if isRepeat:
            sequence.append(pr)
        else:
            out(sequence)
            sequence = []
            sequence.append(pr)
            isRepeat = True
    else:
        if isRepeat:
            sequence.append(pr)
            out(sequence)
            sequence = []
            isRepeat = False
        else:
            sequence.append(pr)

for a, b in izip(src[:-1], src[1:]):
    testRepeate(a, b, a)
testRepeate(src[-2], src[-1], src[-1])
out(sequence)

Solution

First of all I must tell you that Python makes astonishing easy to write a Run-length encoding:

>>> src = [1,1,2,3,3,4,2,1,3,3,4,3,3,3,3,3,4,5]
>>> from itertools import groupby
>>>
>>> [(k, sum(1 for _ in g)) for k,g in groupby(src)]
[(1, 2), (2, 1), (3, 2), (4, 1), (2, 1), (1, 1), (3, 2), (4, 1), (3, 5), (4, 1), (5, 1)]

Take a look at how itertools.groupby() to see how a generator will be a good choice for this kind of task

But that doesn't mean that it wasn't a bad idea try to implement it yourself. So let's take a deeper look at your code :)

The best use of global is usually the one that doesn't use global at all

The use of the global keyword among beginners is almost always a sign of trying to program using some other language's mindset. Usually, if a function need to know the value of somthing, pass that something along as an argument.

In your case, what you need, is a function that loops through your list holding the value of isRepeat.

Something like this:

def parse(seq):
    is_repeat = False
    old = None
    for num in seq:
        is_repeat = num == old    # will be True or False
        if is_repeat:
            # ...

I didn't go too far since, as you may have already noticed, the is_repeat flag is a bit useless now, so we'll get rid of that later.

You may have also noted that I called isReapeat is_repeat, I did that to follow some naming style guide (see PEP8 for more info).

Cleared this, let's get at your statement:

In the first step, I need to split my source list to sequences of repeating and non-repeating elements.

Ok, to split a sequence we'll have to parse it item by item, so while we'll be at it why don't already count how many times an item occurs?

The wrong step was to build a testRepeat function, such function should act on 2 items at the time, but that will lose the "view" of the whole sequence one item after the other. Well ok, a recursive solution could also be found, but I can't go through all the possible solutions.

So, to do achive our goal one could:

parse the list item by item.

see if an item is the same as the previous one.

if it is count += 1.

else store that item with its count.

This might be the code:

def rle(seq):
    result = []
    old = seq[0]
    count = 1
    for num in seq[1:]:
        if num == old:
            count += 1
        else:
            result.append((old, count))   # appending a tuple
            count = 1      # resetting the counting
            old = num
    result.append((old, count))    # appending the last counting
    return result

There could still be room for improvement (also it doesn't handle empty list like hdima noticed), but I just wanted to show an easy example and premature optimization is the root of (most) evil. (Also, there's groupby waiting to be used.)

Notes

It's very common to fall in some of the mistakes you did. Keep trying and you'll get better. Keep also in mind some of the following:

Think hard about the your design.

Do you really need an out function? Couldn't you just place the code there?

Do you really want to not print at all, if you get an empty list? I would really like to see what my code is returning, so I wouldn't block its output.

Don't write cryptic code, write readable one.

What are a, b and pr. A variable name should be something that reflect what it holds.

The same could be said for sequence, if sequece is the resulting sequence call it result or something like that.

Whorship The Zen of Python as an almighty guide!

Code Snippets

>>> src = [1,1,2,3,3,4,2,1,3,3,4,3,3,3,3,3,4,5]
>>> from itertools import groupby
>>>
>>> [(k, sum(1 for _ in g)) for k,g in groupby(src)]
[(1, 2), (2, 1), (3, 2), (4, 1), (2, 1), (1, 1), (3, 2), (4, 1), (3, 5), (4, 1), (5, 1)]

def parse(seq):
    is_repeat = False
    old = None
    for num in seq:
        is_repeat = num == old    # will be True or False
        if is_repeat:
            # ...

def rle(seq):
    result = []
    old = seq[0]
    count = 1
    for num in seq[1:]:
        if num == old:
            count += 1
        else:
            result.append((old, count))   # appending a tuple
            count = 1      # resetting the counting
            old = num
    result.append((old, count))    # appending the last counting
    return result

Context

StackExchange Code Review Q#9721, answer score: 5

Revisions (0)

No revisions yet.