HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Huffman Coding and Depth Calculation?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
depthcalculationhuffmanandcoding

Problem

I'd like to as a variation on this question regarding Huffman tree building. Is there any theory or rule of thumb to calculate the depth of a Huffman tree from the input (or frequency), without drawing tree.

Specific Example is : 10-Input Symbol with Frequency 1 to 10 is 5. the above question mentioned depth is 5.

I read about max depth of Huffman and Entropy, but not well description for me.

Edit:

We know in my specific example the frequency is: $1$ to $10$ which means probability is: $(1/55)+(2/55)+(3/55)+(4/55)+(5/55)+(6/55)+(7/55)+(8/55)+(9/55)+(10/55) = 1 $

Solution

In each step of the Huffman coding algorithm the list of probabilities is being sorted and the 2 lowest of them are merged into a new node of the tree, resulting into a new probability. As for your example probabilities the below illustration shows:

step 1:  [1/55,2/55,3/55,4/55,5/55,6/55,7/55,8/55,9/55,10/55]
step 2:  [3/55,3/55,4/55,5/55,6/55,7/55,8/55,9/55,10/55]
step 3:  [4/55,5/55,6/55,6/55,,7/55,8/55,9/55,10/55]
step 4:  [6/55,6/55,7/55,8/55,9/55,9/55,10/55]
step 5:  [7/55,8/55,9/55,9/55,10/55,12/55]
step 6:  [9/55,9/55,10/55,12/55,15/55]
step 7:  [10/55,12/55,15/55,18/55]
step 8:  [15/55,18/55,22/55]
step 9:  [22/55,33/55]
step 10: [1]


So in each step the set of the corresponding probabilities is reduced by 1. This corresponds to the size of the tree being increased by one.
Without drawing the tree we can quickly deduce that the length (or height depending on how you visualize it) will be 10.This is because the algorithm terminates when we reach the root of the tree that corresponds to probability of 1. As you can see there is absolutely no difference with the case of probabilities being all equal or any other specific distribution of probabilities among the symbols.

Note that there is not direct link to the average length of the resulting code and the length/height of the Huffman tree. The former depends very much on the probability distribution of the alphabet symbols. The best metric that we can use to address this issue is the entropy of the source that uses the alphabet under consideration. Entropy is a measure of the average uncertainty that we have about which symbol will appear next (at source's output) and is defined as this
$$H(\Phi) =\sum_{i=1}^{N} p(s_i)\cdot log_2(1/p(s_i))$$
Entropy indicates the minimum average codelength that we can achieve in coding the source $\Phi$. In general, when the symbols have all equal probabilities of appearance the entropy is maximized and $H(\Phi)=log_2N$. The intuition behind this is that when all symbols are equiprobable then our uncertainty about the source's output is maximized.

Code Snippets

step 1:  [1/55,2/55,3/55,4/55,5/55,6/55,7/55,8/55,9/55,10/55]
step 2:  [3/55,3/55,4/55,5/55,6/55,7/55,8/55,9/55,10/55]
step 3:  [4/55,5/55,6/55,6/55,,7/55,8/55,9/55,10/55]
step 4:  [6/55,6/55,7/55,8/55,9/55,9/55,10/55]
step 5:  [7/55,8/55,9/55,9/55,10/55,12/55]
step 6:  [9/55,9/55,10/55,12/55,15/55]
step 7:  [10/55,12/55,15/55,18/55]
step 8:  [15/55,18/55,22/55]
step 9:  [22/55,33/55]
step 10: [1]

Context

StackExchange Computer Science Q#53212, answer score: 3

Revisions (0)

No revisions yet.