patterncMinor
Simple string compression reloaded
Viewed 0 times
compressionsimplestringreloaded
Problem
Inspired by this question I thought I provide my implementation. I tried to go with the spirit of the *nix tool chain - read from stdin and write to stdout. This has the added benefit of making buffering very easy (current and previous characters and the count).
All kinds of reviews welcome (best practices, error handling, weird edge cases, potential bugs or other pitfalls).
All kinds of reviews welcome (best practices, error handling, weird edge cases, potential bugs or other pitfalls).
#include
#include
#include
void write_char(int c)
{
if (EOF == putchar(c))
{
if (ferror(stdout))
{
perror("error writing char to stdout");
exit(EXIT_FAILURE);
}
}
}
void write_count(uint64_t count)
{
if (printf("%ull", count) 0)
{
write_count(current_char_count);
}
write_char(current_char);
current_char_count = 1;
previous_char = current_char;
}
else
{
current_char_count += 1;
}
}
}Solution
Compressor number or real
When you are
A possible solution for this might be to just write the number itself to the file (no ASCII). That way, when you encounter a number that is ASCII, you can be almost sure that the number is part of the content (that is, unless there was a letter that occurred so many times in a row that the counter rose into the
Two ones or twelve?
This is kind of a continuation from the top one. Let's say your compressor went to go compress this file:
12
Now, I am ready to decompress it. Since your compressor writes a number to show occurrences of a character, the output would be this:
1121
How do I know if all of those numbers are part of the content?
The only fix I can think of, unfortunately, would be to follow the above tip and write
Misc
You are missing a brace here.
When compiling your code, I get this on this line:
This also showed a problem that two
When you are
write_counting, you are writing the ASCII number characters to the new file. However, when you go to decompress this file, how are you going to differentiate between the actual content in the file and the numbers that mark the occurrences of a character?A possible solution for this might be to just write the number itself to the file (no ASCII). That way, when you encounter a number that is ASCII, you can be almost sure that the number is part of the content (that is, unless there was a letter that occurred so many times in a row that the counter rose into the
'0'-'9' range).Two ones or twelve?
This is kind of a continuation from the top one. Let's say your compressor went to go compress this file:
12
Now, I am ready to decompress it. Since your compressor writes a number to show occurrences of a character, the output would be this:
1121
How do I know if all of those numbers are part of the content?
The only fix I can think of, unfortunately, would be to follow the above tip and write
0x01 instead of an ASCII number.Misc
while (EOF != (current_char = getchar())You are missing a brace here.
if (printf("%ull", count) < 0)When compiling your code, I get this on this line:
warning: format ‘%u’ expects argument of type ‘unsigned int’, but argument 2 has type ‘uint64_t’ [-Wformat=]This also showed a problem that two
ls are written after the number that shows how many character occurrences there were.Code Snippets
while (EOF != (current_char = getchar())if (printf("%ull", count) < 0)warning: format ‘%u’ expects argument of type ‘unsigned int’, but argument 2 has type ‘uint64_t’ [-Wformat=]Context
StackExchange Code Review Q#115680, answer score: 2
Revisions (0)
No revisions yet.