HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

What is the expected number of nodes at depth d of a tree after i random insertions

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
depthafternodesnumbertherandomwhatexpectedinsertionstree

Problem

Suppose one wanted to build a tree at random. Let the first insertion at step $i = 1$ be the root node. From here, nodes are inserted into the tree at random one at a time. How would one go about calculating the expected number of nodes $E(d)$ at depth $d$ after $i$ insertions?

For example, At $i = 1$, it's just the root node, so $E(0) = 1$ and all other depths will have zero nodes. At $i = 2$, $E(0) = 1$ and $E(1) = 1$ as the inserted node has to be at depth 1 from the root. At $i = 3$, the next node may either be attached to the root node or the existing node at depth one, so the tree may either look like:


*
/ \



Or:


*
/
*
/
*


Depending on what happend at $i = 3$, at $i = 4$ there's either a $2/3$ chance of the new node attaching at depth 2 and $1/3$ attaching at depth 1, or an even $1/3$ probability of the new node attaching at the root, the node at depth 1 or the node at depth 2. Keeping random insertion in mind (not a binary or balanced tree in any way), how would one go about calculating the expected number of nodes at depth $d$ after $i$ insertions?

Solution

Let $X_{d,t}$ denote the expected number of nodes at level $d$ and time $t$, and let $p_{d,t}$ be the probability that at time $t$, a node is added at level $d$. Our indexing is such that $p_{0,0} = X_{0,t} = 1$. Clearly
$$ X_{d,t} = \sum_{s=0}^t p_{d,t}. $$
Suppose that at time $t-1$, the number of leaves at depth $d-1$ is $X$. Then the probability that at time $t$ a leaf is added at depth $d$ is $X/t$. The expectation of this across the first $t-1$ steps is exactly $X_{d-1,t-1}$, and so
$$ p_{d,t} = \frac{X_{d-1,t-1}}{t}. $$
Therefore
$$ X_{d,t} = \sum_{s=1}^t \frac{X_{d-1,s-1}}{s}. $$
What does this look like? Let's use generating functions:
$$ P_d(z) = \sum_{t=0}^\infty X_{d,t} z^t = \sum_{t=0}^\infty \sum_{s=1}^t \frac{X_{d-1,s-1}}{s} z^t = \sum_{s=1}^\infty \frac{X_{d-1,s-1}}{s} \sum_{t=s}^\infty z^t = \sum_{s=1}^\infty \frac{X_{d-1,s-1}}{s} \frac{z^s}{1-z}. $$
Notice that
$$
\int_0^z P_{d-1}(w) dw = \sum_{s=1}^\infty X_{d-1,s-1} \int_0^z w^{s-1} dw = \sum_{s=1}^\infty X_{d-1,s-1} \frac{w^s}{s}.
$$
Therefore
$$
P_d(z) = \frac{1}{1-z} \int_0^z P_{d-1}(w) dw, \quad P_0(z) = \frac{1}{1-z}.
$$
We can calculate
$$
\begin{align*}
P_0(z) &= \frac{1}{1-z} \\
P_1(z) &= \frac{-\log (1-z)}{1-z} \\
P_2(z) &= \frac{1}{2} \frac{\log^2(1-z)}{1-z} \\
P_3(z) &= \frac{1}{6} \frac{-\log^3(1-z)}{1-z}
\end{align*}
$$
A pattern becomes apparent, and is easy to prove by induction:
$$ P_d(z) = \frac{1}{d!} \frac{\log^d (1-z)^{-1}}{1-z}. $$
We can pack all this data into a bivariate generating function:
$$ P(z,w) = \sum_{d \geq 0} X_{d,t} z^t w^d = \frac{\exp (w\log (1-z)^{-1})}{1-z} = (1-z)^{-w-1}.$$
In principle, we can compute $X_{d,t}$ asymptotically from the generating functions; but it is easier to do this directly:
$$
\begin{align*}
X_{0,t} &= 1 \\
X_{1,t} &= \sum_{s=1}^t \frac{1}{s} = \log t + O(1) \\
X_{2,t} &= \sum_{s=1}^t \frac{\log s + O(1)}{s} = \int_1^t \frac{\log s + O(1)}{s} ds + O\left(\frac{\log t}{t}\right) = \frac{\log^2 t}{2} + O(\log t) \\
X_{3,t} &= \sum_{s=1}^t \frac{\log^2 s + O(\log s)}{2s} = \int_1^t \frac{\log^2 s + O(\log s)}{2s} ds + O\left(\frac{\log^2 t}{t}\right) = \frac{\log^3 t}{6} + O(\log^2 t)
\end{align*}
$$
Generalizing, we get
$$X_{d,t} = \frac{1}{d!} \log^d t + O(\log^{d-1} t).$$
With more work, it is possible to obtain lower order terms as well, indeed a polynomial in $\log t$ followed by a Laurent series.

Context

StackExchange Computer Science Q#22580, answer score: 9

Revisions (0)

No revisions yet.