HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

What is the best way to compress a string with only numbers and commas in it?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
thewhatwithnumberswaycompressonlyandstringcommas

Problem

I have a string like this:

11,512,64,23,46,12,67,85,84


And I'm sure it's following these rules below:

  • Only , and 0123456789 in it.



  • No number will be bigger than 999, that means they are all 1,2 or 3 digits number.



Are there any simple algorithms that I can have to compress/decompress the data?

For instance, a very stupid way I can think is to convert each number into base-16 or even base base-36 numbers.

Solution

Treat this as a base-11 number (the comma is treated like the digit "10"). Convert the base-11 number to binary, and use the resulting binary string as the compressed version of your data.

This will be close to optimal. It falls short of optimal only in that it does not take into account the guarantee that every number in the list will have at most 3 digits -- but even taking that into account would not improve the compression ratio much.

The absolute best method will depend on the distribution of data that you will be compressing; but I expect this will be close to optimal for most real-world distributions.

Context

StackExchange Computer Science Q#100211, answer score: 3

Revisions (0)

No revisions yet.