patternpythonMinor
List splitting for RLE compression algorithm
Viewed 0 times
rlecompressionsplittingalgorithmforlist
Problem
I try to implement a simple RLE compression algorithm as a learning task. In the first step, i need to split my source list to sequences of repeating and non-repeating elements. I've done it, but code seems to be so ugly. Can I simplify it?
src = [1,1,2,3,3,4,2,1,3,3,4,3,3,3,3,3,4,5]
from itertools import izip
isRepeat = False
sequence = []
def out(sequence):
if sequence:
print sequence
def testRepeate(a, b, pr):
global isRepeat
global sequence
if a == b:
if isRepeat:
sequence.append(pr)
else:
out(sequence)
sequence = []
sequence.append(pr)
isRepeat = True
else:
if isRepeat:
sequence.append(pr)
out(sequence)
sequence = []
isRepeat = False
else:
sequence.append(pr)
for a, b in izip(src[:-1], src[1:]):
testRepeate(a, b, a)
testRepeate(src[-2], src[-1], src[-1])
out(sequence)Solution
First of all I must tell you that Python makes astonishing easy to write a Run-length encoding:
Take a look at how
But that doesn't mean that it wasn't a bad idea try to implement it yourself. So let's take a deeper look at your code :)
The best use of
The use of the
In your case, what you need, is a function that loops through your list holding the value of
Something like this:
I didn't go too far since, as you may have already noticed, the
You may have also noted that I called
Cleared this, let's get at your statement:
In the first step, I need to split my source list to sequences of repeating and non-repeating elements.
Ok, to split a sequence we'll have to parse it item by item, so while we'll be at it why don't already count how many times an item occurs?
The wrong step was to build a
So, to do achive our goal one could:
This might be the code:
There could still be room for improvement (also it doesn't handle empty list like hdima noticed), but I just wanted to show an easy example and premature optimization is the root of (most) evil. (Also, there's groupby waiting to be used.)
Notes
It's very common to fall in some of the mistakes you did. Keep trying and you'll get better. Keep also in mind some of the following:
What are
>>> src = [1,1,2,3,3,4,2,1,3,3,4,3,3,3,3,3,4,5]
>>> from itertools import groupby
>>>
>>> [(k, sum(1 for _ in g)) for k,g in groupby(src)]
[(1, 2), (2, 1), (3, 2), (4, 1), (2, 1), (1, 1), (3, 2), (4, 1), (3, 5), (4, 1), (5, 1)]Take a look at how
itertools.groupby() to see how a generator will be a good choice for this kind of taskBut that doesn't mean that it wasn't a bad idea try to implement it yourself. So let's take a deeper look at your code :)
The best use of
global is usually the one that doesn't use global at allThe use of the
global keyword among beginners is almost always a sign of trying to program using some other language's mindset. Usually, if a function need to know the value of somthing, pass that something along as an argument.In your case, what you need, is a function that loops through your list holding the value of
isRepeat.Something like this:
def parse(seq):
is_repeat = False
old = None
for num in seq:
is_repeat = num == old # will be True or False
if is_repeat:
# ...I didn't go too far since, as you may have already noticed, the
is_repeat flag is a bit useless now, so we'll get rid of that later.You may have also noted that I called
isReapeat is_repeat, I did that to follow some naming style guide (see PEP8 for more info).Cleared this, let's get at your statement:
In the first step, I need to split my source list to sequences of repeating and non-repeating elements.
Ok, to split a sequence we'll have to parse it item by item, so while we'll be at it why don't already count how many times an item occurs?
The wrong step was to build a
testRepeat function, such function should act on 2 items at the time, but that will lose the "view" of the whole sequence one item after the other. Well ok, a recursive solution could also be found, but I can't go through all the possible solutions.So, to do achive our goal one could:
- parse the list item by item.
- see if an item is the same as the previous one.
- if it is
count += 1.
- else store that item with its count.
This might be the code:
def rle(seq):
result = []
old = seq[0]
count = 1
for num in seq[1:]:
if num == old:
count += 1
else:
result.append((old, count)) # appending a tuple
count = 1 # resetting the counting
old = num
result.append((old, count)) # appending the last counting
return resultThere could still be room for improvement (also it doesn't handle empty list like hdima noticed), but I just wanted to show an easy example and premature optimization is the root of (most) evil. (Also, there's groupby waiting to be used.)
Notes
It's very common to fall in some of the mistakes you did. Keep trying and you'll get better. Keep also in mind some of the following:
- Think hard about the your design.
- Do you really need an
outfunction? Couldn't you just place the code there?
- Do you really want to not print at all, if you get an empty list? I would really like to see what my code is returning, so I wouldn't block its output.
- Don't write cryptic code, write readable one.
What are
a, b and pr. A variable name should be something that reflect what it holds.- The same could be said for
sequence, ifsequeceis the resulting sequence call itresultor something like that.
- Whorship The Zen of Python as an almighty guide!
Code Snippets
>>> src = [1,1,2,3,3,4,2,1,3,3,4,3,3,3,3,3,4,5]
>>> from itertools import groupby
>>>
>>> [(k, sum(1 for _ in g)) for k,g in groupby(src)]
[(1, 2), (2, 1), (3, 2), (4, 1), (2, 1), (1, 1), (3, 2), (4, 1), (3, 5), (4, 1), (5, 1)]def parse(seq):
is_repeat = False
old = None
for num in seq:
is_repeat = num == old # will be True or False
if is_repeat:
# ...def rle(seq):
result = []
old = seq[0]
count = 1
for num in seq[1:]:
if num == old:
count += 1
else:
result.append((old, count)) # appending a tuple
count = 1 # resetting the counting
old = num
result.append((old, count)) # appending the last counting
return resultContext
StackExchange Code Review Q#9721, answer score: 5
Revisions (0)
No revisions yet.