patternMinor

Can expected "depth" of an element and expected "height" differ significantly?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

algorithm-analysis average-case cs data-structures algorithms stackoverflow probability-theory

depthcanheightexpectedelementdiffersignificantlyand

Problem

When analysing treaps (or, equivalently, BSTs or Quicksort), it is not too hard to show that

$\qquad\displaystyle \mathbb{E}[d(k)] \in O(\log n)$

where $d(k)$ is the depth of the element with rank $k$ in the set of $n$ keys.
Intuitively, this seems to imply that also

$\qquad\displaystyle \mathbb{E}[h(T)] \in O(\log n)$

where $h(T)$ is the height of treap $T$, since

$\qquad\displaystyle h(T) = \max_{k \in [1..n]} d(k)$.

Formally, however, there does not seem to be an (immediate) relationship. We even have

$\qquad\displaystyle \mathbb{E}[h(T)] \geq \max_{k \in [1..n]} \mathbb{E}[d(k)]$

by Jensen's inequality. Now, one can show expected logarithmic height via tail bounds, using more insight into the distribution of $d(k)$.

It is easy to construct examples of distributions that skrew with above intuition, namely extremely asymmetric, heavy-tailed distributions. The question is, can/do such occur in the analysis of algorithms and data structures?

Are there example for data structures $D$ (or algorithms) for which

$\qquad\displaystyle \mathbb{E}[h(D)] \in \omega(\max_{e \in D} \mathbb{E}[d(e)])$?

Nota bene:

-
Of course, we have to interpret "depth" and "height" liberally if we consider structures that are not trees. Based on the posts Wandering Logic links to, "Expected average search time" (for $1/n \cdot \sum_{e \in D} \mathbb{E}[d(e)]$) and "expected maximum search time" (for $\mathbb{E}[h(D)]$) seem to be used.

-
A related question on math.SE has yielded an interesting answer that may allow deriving useful bounds on $\mathbb{E}[h(D)]$ given suitable bounds on $\mathbb{E}[d(e)]$ and $\mathbb{V}[d(e)]$.

Solution

In chained hash tables with uniform hashing a similar question arises. Given a table with $n$ elements hashed into $m$ buckets, the expected number of elements per bucket is $n/m$, but what is the expected length of the longest chain?

I found a Stackoverflow question about chained hash tables. The answer by btilly argues that for fixed ratio $n/m$ the expected worst case longest chain is $\Theta(\log n/ \log\log n)$. (Since $n/m$ is fixed, $m$ doesn't need to come into the answer.)

And the answer by templatetypedef references the following paper for a more detailed analysis:

Gaston H. Gonnet: Expected Length of the Longest Probe Sequence in Hash Code Searching.

J. ACM_ 28(2):289-304, 1981.

[Free techreport version at University of Waterloo]

Slightly more generally I found a math.stackexchange question about the balls and bins problem. The answer by Yuval Filmus cites the following paper, which gives even tighter bounds:
Martin Raab; Angelika Steger: Balls into Bins - A Simple and Tight Analysis. 2nd Intl Wkshp on Randomization and Approximation Techniques in Computer Science, pp. 159-170, 1998.

Finally, there is apparently a branch of statistics called extreme value theory culminating in the Fisher-Tippett-Gnedenko theorem, which, I think, gives expected maxima for a huge variety of distributions.

Context

StackExchange Computer Science Q#12830, answer score: 3

Revisions (0)

No revisions yet.