snippetcppMinor
Create 'worst case test' data for compression test
Viewed 0 times
casecreatecompressiontestforworstdata
Problem
I am going to prepare some test data for a compression test. One of the them is the 'worst case test', which should make the compressor work worst. Use random number to generate such a file is an idea, but there still contain some kind of patterns in the data. Using 7zip to compress such a file, I get a output file which is a little bit smaller than input file.
So I make a small piece of code to generate a file, which does not contain any repeated two bytes. To make life difficult, I shuffle those byte pairs in special order, hope the compressor will have more difficulty to find any match, even with predication.
It's running in windows, and I think it's easy to change filename to make it work on other system, or even use command line argument as filename, but that's not the point.
I tried use 7zip to compress it, both format (7z and zip) created file bigger than original file. I have some other compressors, will test them later.
Any suggestions and help are appreciated.
So I make a small piece of code to generate a file, which does not contain any repeated two bytes. To make life difficult, I shuffle those byte pairs in special order, hope the compressor will have more difficulty to find any match, even with predication.
#include
#include
int main(int argc, char* argv[])
{
const char* filename = "c:\\_Test\\hard.dat";
std::ofstream ofs(filename, std::ios::binary | std::ios::out);
if (ofs.bad())
{
std::cerr << "fail to open file\n";
return -1;
}
for (unsigned int i = 0; i < 0x10000; ++i)
{
unsigned int t = (i * 0xc369) & 0xFFFF;
ofs.write((char*)&t, 2);
}
std::cout << "job done\n";
return 0;
}It's running in windows, and I think it's easy to change filename to make it work on other system, or even use command line argument as filename, but that's not the point.
I tried use 7zip to compress it, both format (7z and zip) created file bigger than original file. I have some other compressors, will test them later.
Any suggestions and help are appreciated.
Solution
This is a cool idea! Here are some thoughts I had:
What are these magic numbers?
You have 4 magic numbers in your main loop:
or
Or something along those lines. (Technically it's 1 more than the max, so maybe something more descriptive.)
Honestly, I can live with
You could get rid of the
That leaves
Comments
I'm a big fan of self-documenting code. However, in this case, it would be nice to have at least a sentence explaining what the loop does. Coming across this code in the code base I work on would leave me scratching my head. I'd have to go back through source control comments to see if there was any clue of what it was about. Maybe just a comment like:
would really help a lot! And if you have a link or a short sentence to explain the algorithm, that would be nice, too.
What are these magic numbers?
You have 4 magic numbers in your main loop:
0x10000, 0xc369, 0xFFFF, and 2. What do they mean? It looks like 0x10000 is the number of 2-byte words you're writing to the file. It's also the limit of a 16-bit number. It would be nice if there were a named constant for that. Perhaps something like:const int kWordsPerFile = 0x10000;or
const int kMax16BitValue = 0x10000;Or something along those lines. (Technically it's 1 more than the max, so maybe something more descriptive.)
Honestly, I can live with
0xFFFF but it wouldn't hurt to give it a name like kLSWMask (where LSW is least significant word) or something similar.You could get rid of the
2 by making t be a uint16_t and then using sizeof(t).That leaves
0xc369. I have no idea what it is. I assume this is some sort of linear congruential pseudo-random number generator, but I'm not really well versed in such things. How is it derived? What significance does it have? Give it a name so it's understandable to someone else 6 months from now.Comments
I'm a big fan of self-documenting code. However, in this case, it would be nice to have at least a sentence explaining what the loop does. Coming across this code in the code base I work on would leave me scratching my head. I'd have to go back through source control comments to see if there was any clue of what it was about. Maybe just a comment like:
// Generate some pseudo-random numbers where no 2 bytes are repeatedwould really help a lot! And if you have a link or a short sentence to explain the algorithm, that would be nice, too.
Code Snippets
const int kWordsPerFile = 0x10000;const int kMax16BitValue = 0x10000;// Generate some pseudo-random numbers where no 2 bytes are repeatedContext
StackExchange Code Review Q#153258, answer score: 6
Revisions (0)
No revisions yet.