patternMinor
Maximum size of Huffman codes for an alphabet containing 256 letters
Viewed 0 times
alphabetcontainingmaximumhuffmansize256codesforletters
Problem
I made a Huffman algorithm for data compression: it takes the content of a file as an input and outputs a compressed string. More important, it generates optimal prefix codes (made of 0 and 1) for every possible character, that is to say for every number between 0 and 255.
While the size of a code cannot exceed 256, it is possible to encode every character with a code of size 8 at most.
However, it is impossible that the algorithm generates a code of size 256 because then the codes wouldn't be optimal. It will give to the most likely chars, codes of size 8 to the others (intuitively).
My question thus is, how can we get an upper bound of the maximum size a code can have, better than 256? I don't see how to approach the problem.
PS: the upper bound I'm looking for doesn't depend on the probability distribution. It has to be an upper bound for every possible probability distribution. As I said, I doubt that there is a probability distribution for which there is a code of size 256, so there has to be an upper bound.
I think I found the solution: For an alphabet of size $n$, the longest possible code is of size $n-1$, so here the answer is 255. So I was actually wrong in my intuition. You just need to arrange the weights so that each one is at least twice as much as the precedent and you get a Huffman tree which outputs a code of maximal size.
While the size of a code cannot exceed 256, it is possible to encode every character with a code of size 8 at most.
However, it is impossible that the algorithm generates a code of size 256 because then the codes wouldn't be optimal. It will give to the most likely chars, codes of size 8 to the others (intuitively).
My question thus is, how can we get an upper bound of the maximum size a code can have, better than 256? I don't see how to approach the problem.
PS: the upper bound I'm looking for doesn't depend on the probability distribution. It has to be an upper bound for every possible probability distribution. As I said, I doubt that there is a probability distribution for which there is a code of size 256, so there has to be an upper bound.
I think I found the solution: For an alphabet of size $n$, the longest possible code is of size $n-1$, so here the answer is 255. So I was actually wrong in my intuition. You just need to arrange the weights so that each one is at least twice as much as the precedent and you get a Huffman tree which outputs a code of maximal size.
Solution
I don't see how to approach the problem.
OK, here is how you can approach the problem. Can you solve this problem if you replace the number 256 with the number 3? How about 4? 5? Try solving those special cases, then see if you spot a pattern...
This is a useful, general pattern. When a problem is too hard, try to find a simpler version of it and think about that. When you see a constant in the problem, try solving simpler versions where the constant is smaller (so the problem becomes easier to think about), then look for a pattern, try to generalize, form a hypothesis, and see if you can prove your hypothesis.
OK, here is how you can approach the problem. Can you solve this problem if you replace the number 256 with the number 3? How about 4? 5? Try solving those special cases, then see if you spot a pattern...
This is a useful, general pattern. When a problem is too hard, try to find a simpler version of it and think about that. When you see a constant in the problem, try solving simpler versions where the constant is smaller (so the problem becomes easier to think about), then look for a pattern, try to generalize, form a hypothesis, and see if you can prove your hypothesis.
Context
StackExchange Computer Science Q#75542, answer score: 8
Revisions (0)
No revisions yet.