patterncMinor
Simple compression reloaded++
Viewed 0 times
compressionsimplereloaded
Problem
As a follow up to my previous question I've improved the code + algorithm. The compression now works the following way:
Each character is followed by a length byte. The top 3 bits of that byte denote the number of additional length bytes which encode the count. The count is stored in "little endian order" (least significant byte first) with the first 5 bits being encoded in the initial length byte.
Some examples:
The program supports a command line option for printing the output in hex for easier reading and debugging. I tried to use a
I've limited the max count to 61 bits to leave 3 bits for the length specifier and have a maximum of 8 bytes for the length output.
All types of feedback appreciated.
`#include
#include
#include
#include
#include
#include
#define NUM_LENGTH_BITS 3
static const uint64_t MAX_COUNT = (~0ULL) >> NUM_LENGTH_BITS;
static bool print_hex = false;
static int most_significant_bit(uint64_t value)
{
int i = 0;
while (value)
{
++i;
value >>= 1;
}
return i;
}
static int num_bytes_required(uint64_t value)
{
int msb = most_significant_bit(value);
int bytes = msb / 8;
int remainder = msb % 8;
if (remainder)
{
bytes += (remainder + NUM_LENGTH_BITS) > 8 ? 2 : 1;
}
return bytes;
}
void write_char(int c)
{
if (print_hex)
{
if (printf("0x%x ", c) = 0 && additional_length_bytes >= 5;
while (count)
{
write_char(count & 0xFF);
count >>= 8;
}
}
static void parse_args(int argc, char **argv)
{
if (argc == 2)
{
print_hex = (s
Each character is followed by a length byte. The top 3 bits of that byte denote the number of additional length bytes which encode the count. The count is stored in "little endian order" (least significant byte first) with the first 5 bits being encoded in the initial length byte.
Some examples:
Count == 1 yields in length byte 00000001
Count == 31 yields in length byte 00011111
Count == 32 yields in length bytes 00100000 followed by 0000001
Count == 33 yields in length bytes 00100001 followed by 0000001
The program supports a command line option for printing the output in hex for easier reading and debugging. I tried to use a
const int for the NUM_LENGTH_BITS as well but then the compiler claimed that the initializer element for MAX_COUNT isn't constant. Not sure if that can be worked around.I've limited the max count to 61 bits to leave 3 bits for the length specifier and have a maximum of 8 bytes for the length output.
All types of feedback appreciated.
`#include
#include
#include
#include
#include
#include
#define NUM_LENGTH_BITS 3
static const uint64_t MAX_COUNT = (~0ULL) >> NUM_LENGTH_BITS;
static bool print_hex = false;
static int most_significant_bit(uint64_t value)
{
int i = 0;
while (value)
{
++i;
value >>= 1;
}
return i;
}
static int num_bytes_required(uint64_t value)
{
int msb = most_significant_bit(value);
int bytes = msb / 8;
int remainder = msb % 8;
if (remainder)
{
bytes += (remainder + NUM_LENGTH_BITS) > 8 ? 2 : 1;
}
return bytes;
}
void write_char(int c)
{
if (print_hex)
{
if (printf("0x%x ", c) = 0 && additional_length_bytes >= 5;
while (count)
{
write_char(count & 0xFF);
count >>= 8;
}
}
static void parse_args(int argc, char **argv)
{
if (argc == 2)
{
print_hex = (s
Solution
A few minor notes (looks pretty good for the most part!):
-
Your
-
I'm not a fan of you using
-
Why do you do
-
Your
most_significant_bit() function loop should be condensed into a for loop. The main difference between the for's and the while's is a matter of pragmatics: we usually use for when there is a known number of iterations (which might not seem like the case here, but it's true), and use while constructs when the number of iterations in not known in advance.-
I'm not a fan of you using
-h as the command line argument equivalent to --hex. Whenever I think of -h, I think of it as the shorthand of --help. A better shorthand would be -x in my opinion.-
Why do you do
current_char_count += 1? Why not current_char_count++/++current_char_count (both do the same thing in the same amount of time in this case)?Context
StackExchange Code Review Q#115763, answer score: 5
Revisions (0)
No revisions yet.