HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Run Length Encoding (RLE) and Move To Front (MTF) in C++

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
rlelengthmovefrontmtfencodingandrun

Problem

I have made two quite basic algorithms used for lossless data compression.

This is my RLE:

#include 

std::string rle(const std::string& input) // encode (aaaaeecd -> 4a2e1c1d)
{
    std::string output;

    int n = input.size();
    int i = 0;
    while(i  aaaaeecd)
{
    std::string output;

    for(int i = 0; i < input.size(); i += 2)
    {
        int nb = (int)(unsigned char)input[i]; // range between 0 and 255
        for(int k = 0; k < nb; k++)
            output += input[i+1];
    }

    return output;
}


and here goes my MTF:

#include 
#include 
#include 
#include 

/*
if alphabet A = {'a', 'b', 'c'}
    then string "accbc" is encoded 02021
    (a is at index 0, alphabet remains {'a', 'b', 'c'}, outputs 0 ;
     c is at index 2, alphabet becomes {'c', 'a', 'b'}, outputs 2 ;
     c is at index 0, alphabet remains {'c', 'a', 'b'}, outputs 0 ;
     b is at index 2, alphabet becomes {'b', 'c', 'a'}, outputs 2 ;
     a is at index 1, alphabet becomes {'a', 'b', 'c'}, outputs 1)
*/
std::string mtf(const std::string& input)
{
    std::string output;

    std::list alphabet; // constant erase/insert
    for(int i = 0; i  alphabet; // constant erase (begin) & operator[] ; insert (middle) is O(n)
    for(int i = 0; i <= 255; i++)
        alphabet.push_back((char)i);

    for(unsigned char n: input)
    {
        char temp_c = alphabet[n];
        output += temp_c;
        alphabet.erase(alphabet.begin() + n);
        alphabet.insert(alphabet.begin(), temp_c);
    }

    return output;
}


I know there are many possible improvments possible, especially for the RLE algorithm. I would just like a review on the C++ syntax I've used, especially on the structures I have chosen.

Solution

Looks quite nice, but let's get down to ripping it apart.

First your runlength-encoding:

-
Try to use the same type for size-variables / indices throughout:

std::basic_string uses std::size_t.

That way you would also heal the one sign-mismatch.

-
Your comment about the encoded form // encode (aaaaeecd -> 4a2e1c1d) is wrong.

You probably meant: // encode (aaaaeecd -> \x04a\x02e\x01c\x01d)

-
You are wasting \x00 and thus the chance to encode runs of length 256.

(Yes, the comment // run_count is between 0 and 255 (unsigned char) is wrong.)

-
You are missing the header `.

Next your Move-To-Front:

-
I suggest you move from
std::list to std::array.

A list is a far too heavy data-structure, jumping every-which-way in memory kills your performance quite easily.

-
The
std::deque is a bit better, but using a std::array` would probably still handily beat it due to simplicity.

Context

StackExchange Code Review Q#163189, answer score: 4

Revisions (0)

No revisions yet.