patternMinor
Randomized Meldable Heap - Expected Height
Viewed 0 times
heightrandomizedexpectedheapmeldable
Problem
Randomized Meldable Heaps have an operation "meld", which we then use to define all other operations, including insert.
The question is, what is an expected height of that tree with $n$ nodes?
Theorem 1 of Gambin and Malinkowski, Randomized Meldable Priority Queues (Proceedings of SOFSEM 1998, Lecture Notes in Computer Science vol. 1521, pp. 344–349, 1998; PDF) gives the answer to this question with proof. However, I do not understand why we can write:
$$\mathbb{E} [ h_Q] = \frac{1}{2} ((1 + \mathbb{E}[h_{Q_L}]) + (1 + \mathbb{E}[h_{Q_R}]))\,.$$
For me the height of the tree is
$$h_Q = 1 + \max\, \{ h_{Q_L}, h_{Q_R}\}\,,$$
which I can expand to:
$$\mathbb{E} [ h_Q] = 1 + \mathbb{E}[\max \,\{ h_{Q_L}, h_{Q_R}\}] = 1 + \sum k \mathbb{P}[\max\, \{ h_{Q_L}, h_{Q_R}\} = k]\,.$$
The probability that the maximum of a height of two subtrees is equal to $k$ can be rewritten using the law of total probability:
\begin{align*}
\hspace{2em}&\hspace{-2em}
\mathbb{P}[\max\, \{ h_{Q_L}, h_{Q_R}\} = k] \\
&= \mathbb{P}[\max\, \{ h_{Q_L}, h_{Q_R}\} = k \mid h_{Q_L}\leq h_{Q_R}]\,\mathbb{P}[h_{Q_L} \leq h_{Q_R}]\\
&\hspace{2em} + \mathbb{P}[\max\, \{ h_{Q_L}, h_{Q_R}\} = k \mid h_{Q_L} > h_{Q_R}]\,\mathbb{P}[h_{Q_L} > h_{Q_R}] \\
&= \mathbb{P}[h_{Q_R} = k \mid h_{Q_L} \leq h_{Q_R}]\,\mathbb{P}[h_{Q_L} \leq h_{Q_R}]\\
&\hspace{2em}+ \mathbb{P}[h_{Q_L} = k \mid h_{Q_L} > h_{Q_R}]\,\mathbb{P}[h_{Q_L} > h_{Q_R}]\,.
\end{align*}
So at the end I get:
\begin{align*}\mathbb{E} [ h_Q] &= 1 + \sum k \{ \mathbb{P}[h_{Q_R} = k \mid h_{Q_L} \leq h_{Q_R}]\,\mathbb{P}[h_{Q_L} \leq h_{Q_R}]\\
&\hspace{2em}+ \mathbb{P}[h_{Q_L} = k \mid h_{Q_L} > h_{Q_R}]\,\mathbb{P}[h_{Q_L} > h_{Q_R}] \}\,.
\end{align*}
This is where I am stuck. I can see that $\mathbb{P}[h_{Q_L} > h_{Q_R}]$ is more or less equal $\frac{1}{2}$ (However we need at most $\leq \frac{1}{2}$). But except that nothing leading to the formula from the beginning.
The heights of the subtrees do not seem to be independent to me.
The question is, what is an expected height of that tree with $n$ nodes?
Theorem 1 of Gambin and Malinkowski, Randomized Meldable Priority Queues (Proceedings of SOFSEM 1998, Lecture Notes in Computer Science vol. 1521, pp. 344–349, 1998; PDF) gives the answer to this question with proof. However, I do not understand why we can write:
$$\mathbb{E} [ h_Q] = \frac{1}{2} ((1 + \mathbb{E}[h_{Q_L}]) + (1 + \mathbb{E}[h_{Q_R}]))\,.$$
For me the height of the tree is
$$h_Q = 1 + \max\, \{ h_{Q_L}, h_{Q_R}\}\,,$$
which I can expand to:
$$\mathbb{E} [ h_Q] = 1 + \mathbb{E}[\max \,\{ h_{Q_L}, h_{Q_R}\}] = 1 + \sum k \mathbb{P}[\max\, \{ h_{Q_L}, h_{Q_R}\} = k]\,.$$
The probability that the maximum of a height of two subtrees is equal to $k$ can be rewritten using the law of total probability:
\begin{align*}
\hspace{2em}&\hspace{-2em}
\mathbb{P}[\max\, \{ h_{Q_L}, h_{Q_R}\} = k] \\
&= \mathbb{P}[\max\, \{ h_{Q_L}, h_{Q_R}\} = k \mid h_{Q_L}\leq h_{Q_R}]\,\mathbb{P}[h_{Q_L} \leq h_{Q_R}]\\
&\hspace{2em} + \mathbb{P}[\max\, \{ h_{Q_L}, h_{Q_R}\} = k \mid h_{Q_L} > h_{Q_R}]\,\mathbb{P}[h_{Q_L} > h_{Q_R}] \\
&= \mathbb{P}[h_{Q_R} = k \mid h_{Q_L} \leq h_{Q_R}]\,\mathbb{P}[h_{Q_L} \leq h_{Q_R}]\\
&\hspace{2em}+ \mathbb{P}[h_{Q_L} = k \mid h_{Q_L} > h_{Q_R}]\,\mathbb{P}[h_{Q_L} > h_{Q_R}]\,.
\end{align*}
So at the end I get:
\begin{align*}\mathbb{E} [ h_Q] &= 1 + \sum k \{ \mathbb{P}[h_{Q_R} = k \mid h_{Q_L} \leq h_{Q_R}]\,\mathbb{P}[h_{Q_L} \leq h_{Q_R}]\\
&\hspace{2em}+ \mathbb{P}[h_{Q_L} = k \mid h_{Q_L} > h_{Q_R}]\,\mathbb{P}[h_{Q_L} > h_{Q_R}] \}\,.
\end{align*}
This is where I am stuck. I can see that $\mathbb{P}[h_{Q_L} > h_{Q_R}]$ is more or less equal $\frac{1}{2}$ (However we need at most $\leq \frac{1}{2}$). But except that nothing leading to the formula from the beginning.
The heights of the subtrees do not seem to be independent to me.
Solution
In the paper, $h_Q$ isn't the height. It's the length of a random walk away from the root in a full binary tree (they insist every leaf is "nil"), so the expression they have is the right thing.
Also, you can avoid induction. The probability of ending at a specific leaf of depth $d$ is just $2^{-d}$. So the expected length of the walk is
$$
\sum_{\ell\in \text{leaves}(Q)} \text{depth}(\ell)2^{-\text{depth}(\ell)}
$$
which the entropy of a distribution a set of size $|\text{leaves}(Q)|$.
Also, you can avoid induction. The probability of ending at a specific leaf of depth $d$ is just $2^{-d}$. So the expected length of the walk is
$$
\sum_{\ell\in \text{leaves}(Q)} \text{depth}(\ell)2^{-\text{depth}(\ell)}
$$
which the entropy of a distribution a set of size $|\text{leaves}(Q)|$.
Context
StackExchange Computer Science Q#54443, answer score: 4
Revisions (0)
No revisions yet.