patternMinor
Counting nodes in a trie
Viewed 0 times
nodescountingtrie
Problem
I'm studying random tries in one of my classes, and was wondering if anyone could offer any guidance regarding a problem.
Question:
Given a random $m$-ary trie with $n$ total leaves, letting $I$ be the number of internal nodes in the tree (that is, nodes which aren't a leaf), show that $E(I)$ is $O(n)$.
About the distribution:
We have an alphabet of $m$ letters (hence an $m$-ary trie), each with a corresponding probability $p_i$. We randomly create $n$ words of infinite length by drawing the first letter from our distribution, the second letter, etc. Finally, we insert these $n$ infinite words into a $k$-ary trie. This trie will be infinite in length, but we prune it by removing any paths which are traversed by only one of our $n$ words, to eliminate 'redundancy'.
Attempt: I tried relating things back to the size of the random trie, which I call $S$. Every non-root node in the trie is the child of some internal node, so $mI = S-1$
That is $E(I) = E\left(\frac S m\right) - \frac 1 m$
So if I can show that the expected size of a random trie is O(n) I'm golden. But unfortunately everything we've covered in class relates to expected depths and heights of a given trie, and I don't have any ideas moving forward. Could anyone offer advice on what might be useful here?
Question:
Given a random $m$-ary trie with $n$ total leaves, letting $I$ be the number of internal nodes in the tree (that is, nodes which aren't a leaf), show that $E(I)$ is $O(n)$.
About the distribution:
We have an alphabet of $m$ letters (hence an $m$-ary trie), each with a corresponding probability $p_i$. We randomly create $n$ words of infinite length by drawing the first letter from our distribution, the second letter, etc. Finally, we insert these $n$ infinite words into a $k$-ary trie. This trie will be infinite in length, but we prune it by removing any paths which are traversed by only one of our $n$ words, to eliminate 'redundancy'.
Attempt: I tried relating things back to the size of the random trie, which I call $S$. Every non-root node in the trie is the child of some internal node, so $mI = S-1$
That is $E(I) = E\left(\frac S m\right) - \frac 1 m$
So if I can show that the expected size of a random trie is O(n) I'm golden. But unfortunately everything we've covered in class relates to expected depths and heights of a given trie, and I don't have any ideas moving forward. Could anyone offer advice on what might be useful here?
Solution
If you don't have any degree-1 nodes in your trie (which is a tree) than you have more leaves than interior nodes. So in this case you have $I\le n $.
It depends a bit how you define the trie whether you can have many interior degree-1 nodes. If you study a compressed trie the all the path of degree-1 nodes are merges to an edge, so you are done. For an uncompressed trie, I am afraid, you can have many degree-1 nodes. Say you have one letter $a_i$ that is very common and has a high probability of $1-\varepsilon$ and $\varepsilon\in \Omega(1/n^2)$. Then your trie contains many long degree-1 paths with high probability. In this case the you can have more than $O(n)$ interior nodes. Please do the computations by yourself (you might choose a smaller $\varepsilon$ if you like).
It depends a bit how you define the trie whether you can have many interior degree-1 nodes. If you study a compressed trie the all the path of degree-1 nodes are merges to an edge, so you are done. For an uncompressed trie, I am afraid, you can have many degree-1 nodes. Say you have one letter $a_i$ that is very common and has a high probability of $1-\varepsilon$ and $\varepsilon\in \Omega(1/n^2)$. Then your trie contains many long degree-1 paths with high probability. In this case the you can have more than $O(n)$ interior nodes. Please do the computations by yourself (you might choose a smaller $\varepsilon$ if you like).
Context
StackExchange Computer Science Q#33852, answer score: 3
Revisions (0)
No revisions yet.