patternModerate
No compression algorithm can compress all input messages?
Viewed 0 times
canallcompressioncompressinputalgorithmmessages
Problem
I just started reading a book called Introduction to Data Compression, by Guy E. Blelloch. On page one, he states:
The truth is that if any one message is shortened by an algorithm, then some other message needs to be lengthened. You can verify this in practice by running GZIP on a GIF file. It is, in fact, possible to go further and show that for a set of input messages of fixed length, if one message is compressed, then the average length of the compressed messages over all possible inputs is always going to be longer than the original input messages.
Consider, for example, the 8 possible 3 bit messages. If one is compressed to two bits, it is not hard to convince yourself that two messages will have to expand to 4 bits, giving an average of 3 1/8 bits.
Really? I find it very hard to convince myself of that. In fact, here's a counter example. Consider the algorithm which accepts as input any 3-bit string, and maps to the following outputs:
So there you are - no input is mapped to a longer output. There are certainly no "two messages" that have expanded to 4 bits.
So what exactly is the author talking about? I suspect either there's some implicit caveat that just isn't obvious to me, or he's using language that's far too sweeping.
Disclaimer: I realize that if my algorithm is applied iteratively, you do indeed lose data. Try applying it twice to the input 110: 110 -> 000 -> 0, and now you don't know which of 110 and 000 was the original input. However, if you apply it only once, it seems lossless to me. Is that related to what the author's talking about?
The truth is that if any one message is shortened by an algorithm, then some other message needs to be lengthened. You can verify this in practice by running GZIP on a GIF file. It is, in fact, possible to go further and show that for a set of input messages of fixed length, if one message is compressed, then the average length of the compressed messages over all possible inputs is always going to be longer than the original input messages.
Consider, for example, the 8 possible 3 bit messages. If one is compressed to two bits, it is not hard to convince yourself that two messages will have to expand to 4 bits, giving an average of 3 1/8 bits.
Really? I find it very hard to convince myself of that. In fact, here's a counter example. Consider the algorithm which accepts as input any 3-bit string, and maps to the following outputs:
000 -> 0
001 -> 001
010 -> 010
011 -> 011
100 -> 100
101 -> 101
110 -> 110
111 -> 111So there you are - no input is mapped to a longer output. There are certainly no "two messages" that have expanded to 4 bits.
So what exactly is the author talking about? I suspect either there's some implicit caveat that just isn't obvious to me, or he's using language that's far too sweeping.
Disclaimer: I realize that if my algorithm is applied iteratively, you do indeed lose data. Try applying it twice to the input 110: 110 -> 000 -> 0, and now you don't know which of 110 and 000 was the original input. However, if you apply it only once, it seems lossless to me. Is that related to what the author's talking about?
Solution
What you are missing is that you need to consider all bits of size 3 or less. That is: if in a compression scheme for bits of size 3 or less we compress one of the 3-bit strings to a 2-bit string, then some string of size 3 or less will have to expand to 3 bits or more.
A losless compression scheme is a function $C$ from finite bit strings to finite bit strings which is injective, i.e., if $C(x) = C(y)$ then $x = y$, i.e., $C(x)$ uniquely determines $x$.
Consider an arbitrary compression scheme $C$ and let $S$ be a set of binary strings. We can express how well $C$ works on $S$ by computing the ratio
$$\text{CompressionRatio}(C,S) = \frac{\sum_{x \in S} \mathrm{length}(C(x))}{\sum_{x \in S} \mathrm{length}(x)}.$$
A small compression ratio is good. For example, if it is $1/2$ that means we can on average compress strings in $S$ by 50% using $C$.
If we try to compress all strings of length at most $n$ then we are in trouble:
Theorem: Let $S$ be the set of all strings of length at most $n$ and $C$ any compression scheme. Then $\text{CompressionRatio}(C,S) \geq 1$.
So, the best compression scheme in the world is the identity function! Well, only if we want to compress random strings of bits. The bit strings which occur in practice are far from random and exhibit a lot of regularity. This is why it makes sense to compress data despite the above theorem.
A losless compression scheme is a function $C$ from finite bit strings to finite bit strings which is injective, i.e., if $C(x) = C(y)$ then $x = y$, i.e., $C(x)$ uniquely determines $x$.
Consider an arbitrary compression scheme $C$ and let $S$ be a set of binary strings. We can express how well $C$ works on $S$ by computing the ratio
$$\text{CompressionRatio}(C,S) = \frac{\sum_{x \in S} \mathrm{length}(C(x))}{\sum_{x \in S} \mathrm{length}(x)}.$$
A small compression ratio is good. For example, if it is $1/2$ that means we can on average compress strings in $S$ by 50% using $C$.
If we try to compress all strings of length at most $n$ then we are in trouble:
Theorem: Let $S$ be the set of all strings of length at most $n$ and $C$ any compression scheme. Then $\text{CompressionRatio}(C,S) \geq 1$.
So, the best compression scheme in the world is the identity function! Well, only if we want to compress random strings of bits. The bit strings which occur in practice are far from random and exhibit a lot of regularity. This is why it makes sense to compress data despite the above theorem.
Context
StackExchange Computer Science Q#7531, answer score: 17
Revisions (0)
No revisions yet.