HiveBrain v1.2.0
Get Started
← Back to all entries
patternMajor

Is there a known maximum for how much a string of 0's and 1's can be compressed?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
canmaximummuchforknownhowandcompressedtherestring

Problem

A long time ago I read a newspaper article where a professor of some sort said that in the future we will be able to compress data to just two bits (or something like that).

This is of course not correct (and it could be that my memory of what he exactly stated is not correct). Understandably it would not be practical to compress any string of 0's and 1's to just two bits because (even if it was technically possible), too many different kind of strings would end up compressing to the same two bits (since we only have '01' and '10' to choose from).

Anyway, this got me thinking about the feasibility of compressing an arbitrary length string of 0's and 1's according to some scheme. For this kind of string, is there a known relationship between the string length (ratio between 0's and 1's probably does not matter) and maximum compression?

In other words, is there a way to determine what is the minimum (smallest possible) length that a string of 0's and 1's can be compressed to?

(Here I am interested in the mathematical maximum compression, not what is currently technically possible.)

Solution

Kolmogorov complexity is one approach for formalizing this mathematically. Unfortunately, computing the Kolmogorov complexity of a string is an uncomputable problem. See also: Approximating the Kolmogorov complexity.

It's possible to get better results if you analyze the source of the string rather than the string itself. In other words, often the source can be modelled as a probabilistic process, that randomly chooses a string somehow, according to some distribution. The entropy of that distribution then tells you the mathematically best possible compression (up to some small additive constant).

On the impossibility of perfect compression, you might also be interested in the following.

  • No compression algorithm can compress all input messages?



  • Compression functions are only practical because "The bit strings which occur in practice are far from random"?



  • Is there any theoretically proven optimal compression algorithm?

Context

StackExchange Computer Science Q#50052, answer score: 47

Revisions (0)

No revisions yet.