patternMinor
Depth of a leaf in a prefix tree (Huffman Coding)
Viewed 0 times
depthhuffmanleafprefixcodingtree
Problem
So, regarding Huffman Coding we know that the optimal way of "storing" it is using a full binary tree (where all nodes have either 0 or 2 children). All leaves are the symbols $i$ of the alphabet we want to encode (all $i$ have a frequency of appearance$f_i$) and if we consider the freq. $f_i$ of each leaf (freq. of that character) then the freq. of a node is the sum of the frequencies of its children, therefore the root has $f_i = 100$ or $f_i = 1$.
Visual example (taken from Introduction to Algorithms) because I'm not really good at explaining this stuff:
Now I want to prove that the depth of a leaf (a character) in this prefix tree is $O(\log_{2}{(1/f_I))}$, where $f_i$ is the freq. of the character $i$.
I've tried by induction, where the base case would be for $\log_{2}{(1/1) = 0 }$ and indeed the depth of the root is 0. But I don't know how to proceed.
I've been thinking about how in a complete binary tree the depth of a leaf is $\log_{2}{n}$ with n being the number of nodes in that level (I'm not 100% sure about this) and how de depth of a character i in the prefix tree = length of the binary code (for example depth of d is 3 and its coding is 111) but I can't see how I could use those two things (if they are even useful).
What do you think about it? I don't know how to approach it :(
Btw thanks for your time and have a nice day!
Visual example (taken from Introduction to Algorithms) because I'm not really good at explaining this stuff:
Now I want to prove that the depth of a leaf (a character) in this prefix tree is $O(\log_{2}{(1/f_I))}$, where $f_i$ is the freq. of the character $i$.
I've tried by induction, where the base case would be for $\log_{2}{(1/1) = 0 }$ and indeed the depth of the root is 0. But I don't know how to proceed.
I've been thinking about how in a complete binary tree the depth of a leaf is $\log_{2}{n}$ with n being the number of nodes in that level (I'm not 100% sure about this) and how de depth of a character i in the prefix tree = length of the binary code (for example depth of d is 3 and its coding is 111) but I can't see how I could use those two things (if they are even useful).
What do you think about it? I don't know how to approach it :(
Btw thanks for your time and have a nice day!
Solution
Consider a symbol $\sigma$ whose frequency is $f$. During the course of Huffman's algorithm, $\sigma$ is matched with other nodes, and the depth of $\sigma$ is the number of times this happens.
At the first time that $\sigma$ is matched, it is either the symbol with smallest frequency or with second smallest frequency. In the former case, it is merged into a new symbol whose frequency is at least $2f$. In the latter case, after $\sigma$ is merged with the symbol of smallest frequency, all symbols have frequency at least $f$, and so the next time that it is merged, the merged symbol will have frequency at least $2f$.
Summarizing, it takes at most two mergings to double the frequency of a symbol. If its original frequency is $f$, then this shows that the depth of $f$ is at most $2\lceil \log_2(1/f) \rceil = O(\log(1/f))$.
At the first time that $\sigma$ is matched, it is either the symbol with smallest frequency or with second smallest frequency. In the former case, it is merged into a new symbol whose frequency is at least $2f$. In the latter case, after $\sigma$ is merged with the symbol of smallest frequency, all symbols have frequency at least $f$, and so the next time that it is merged, the merged symbol will have frequency at least $2f$.
Summarizing, it takes at most two mergings to double the frequency of a symbol. If its original frequency is $f$, then this shows that the depth of $f$ is at most $2\lceil \log_2(1/f) \rceil = O(\log(1/f))$.
Context
StackExchange Computer Science Q#82475, answer score: 3
Revisions (0)
No revisions yet.