HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Simple compression reloaded++

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
compressionsimplereloaded

Problem

As a follow up to my previous question I've improved the code + algorithm. The compression now works the following way:

Each character is followed by a length byte. The top 3 bits of that byte denote the number of additional length bytes which encode the count. The count is stored in "little endian order" (least significant byte first) with the first 5 bits being encoded in the initial length byte.

Some examples:

Count == 1 yields in length byte 00000001
Count == 31 yields in length byte 00011111
Count == 32 yields in length bytes 00100000 followed by 0000001
Count == 33 yields in length bytes 00100001 followed by 0000001


The program supports a command line option for printing the output in hex for easier reading and debugging. I tried to use a const int for the NUM_LENGTH_BITS as well but then the compiler claimed that the initializer element for MAX_COUNT isn't constant. Not sure if that can be worked around.

I've limited the max count to 61 bits to leave 3 bits for the length specifier and have a maximum of 8 bytes for the length output.

All types of feedback appreciated.

`#include
#include
#include
#include
#include
#include

#define NUM_LENGTH_BITS 3
static const uint64_t MAX_COUNT = (~0ULL) >> NUM_LENGTH_BITS;

static bool print_hex = false;

static int most_significant_bit(uint64_t value)
{
int i = 0;
while (value)
{
++i;
value >>= 1;
}
return i;
}

static int num_bytes_required(uint64_t value)
{
int msb = most_significant_bit(value);
int bytes = msb / 8;
int remainder = msb % 8;
if (remainder)
{
bytes += (remainder + NUM_LENGTH_BITS) > 8 ? 2 : 1;
}
return bytes;
}

void write_char(int c)
{
if (print_hex)
{
if (printf("0x%x ", c) = 0 && additional_length_bytes >= 5;

while (count)
{
write_char(count & 0xFF);
count >>= 8;
}
}

static void parse_args(int argc, char **argv)
{
if (argc == 2)
{
print_hex = (s

Solution

A few minor notes (looks pretty good for the most part!):

-
Your most_significant_bit() function loop should be condensed into a for loop. The main difference between the for's and the while's is a matter of pragmatics: we usually use for when there is a known number of iterations (which might not seem like the case here, but it's true), and use while constructs when the number of iterations in not known in advance.

-
I'm not a fan of you using -h as the command line argument equivalent to --hex. Whenever I think of -h, I think of it as the shorthand of --help. A better shorthand would be -x in my opinion.

-
Why do you do current_char_count += 1? Why not current_char_count++/++current_char_count (both do the same thing in the same amount of time in this case)?

Context

StackExchange Code Review Q#115763, answer score: 5

Revisions (0)

No revisions yet.