patternMinor
Can expected "depth" of an element and expected "height" differ significantly?
Viewed 0 times
depthcanheightexpectedelementdiffersignificantlyand
Problem
When analysing treaps (or, equivalently, BSTs or Quicksort), it is not too hard to show that
$\qquad\displaystyle \mathbb{E}[d(k)] \in O(\log n)$
where $d(k)$ is the depth of the element with rank $k$ in the set of $n$ keys.
Intuitively, this seems to imply that also
$\qquad\displaystyle \mathbb{E}[h(T)] \in O(\log n)$
where $h(T)$ is the height of treap $T$, since
$\qquad\displaystyle h(T) = \max_{k \in [1..n]} d(k)$.
Formally, however, there does not seem to be an (immediate) relationship. We even have
$\qquad\displaystyle \mathbb{E}[h(T)] \geq \max_{k \in [1..n]} \mathbb{E}[d(k)]$
by Jensen's inequality. Now, one can show expected logarithmic height via tail bounds, using more insight into the distribution of $d(k)$.
It is easy to construct examples of distributions that skrew with above intuition, namely extremely asymmetric, heavy-tailed distributions. The question is, can/do such occur in the analysis of algorithms and data structures?
Are there example for data structures $D$ (or algorithms) for which
$\qquad\displaystyle \mathbb{E}[h(D)] \in \omega(\max_{e \in D} \mathbb{E}[d(e)])$?
Nota bene:
-
Of course, we have to interpret "depth" and "height" liberally if we consider structures that are not trees. Based on the posts Wandering Logic links to, "Expected average search time" (for $1/n \cdot \sum_{e \in D} \mathbb{E}[d(e)]$) and "expected maximum search time" (for $\mathbb{E}[h(D)]$) seem to be used.
-
A related question on math.SE has yielded an interesting answer that may allow deriving useful bounds on $\mathbb{E}[h(D)]$ given suitable bounds on $\mathbb{E}[d(e)]$ and $\mathbb{V}[d(e)]$.
$\qquad\displaystyle \mathbb{E}[d(k)] \in O(\log n)$
where $d(k)$ is the depth of the element with rank $k$ in the set of $n$ keys.
Intuitively, this seems to imply that also
$\qquad\displaystyle \mathbb{E}[h(T)] \in O(\log n)$
where $h(T)$ is the height of treap $T$, since
$\qquad\displaystyle h(T) = \max_{k \in [1..n]} d(k)$.
Formally, however, there does not seem to be an (immediate) relationship. We even have
$\qquad\displaystyle \mathbb{E}[h(T)] \geq \max_{k \in [1..n]} \mathbb{E}[d(k)]$
by Jensen's inequality. Now, one can show expected logarithmic height via tail bounds, using more insight into the distribution of $d(k)$.
It is easy to construct examples of distributions that skrew with above intuition, namely extremely asymmetric, heavy-tailed distributions. The question is, can/do such occur in the analysis of algorithms and data structures?
Are there example for data structures $D$ (or algorithms) for which
$\qquad\displaystyle \mathbb{E}[h(D)] \in \omega(\max_{e \in D} \mathbb{E}[d(e)])$?
Nota bene:
-
Of course, we have to interpret "depth" and "height" liberally if we consider structures that are not trees. Based on the posts Wandering Logic links to, "Expected average search time" (for $1/n \cdot \sum_{e \in D} \mathbb{E}[d(e)]$) and "expected maximum search time" (for $\mathbb{E}[h(D)]$) seem to be used.
-
A related question on math.SE has yielded an interesting answer that may allow deriving useful bounds on $\mathbb{E}[h(D)]$ given suitable bounds on $\mathbb{E}[d(e)]$ and $\mathbb{V}[d(e)]$.
Solution
In chained hash tables with uniform hashing a similar question arises. Given a table with $n$ elements hashed into $m$ buckets, the expected number of elements per bucket is $n/m$, but what is the expected length of the longest chain?
I found a Stackoverflow question about chained hash tables. The answer by btilly argues that for fixed ratio $n/m$ the expected worst case longest chain is $\Theta(\log n/ \log\log n)$. (Since $n/m$ is fixed, $m$ doesn't need to come into the answer.)
And the answer by templatetypedef references the following paper for a more detailed analysis:
Gaston H. Gonnet: Expected Length of the Longest Probe Sequence in Hash Code Searching.
J. ACM_ 28(2):289-304, 1981.
[Free techreport version at University of Waterloo]
Slightly more generally I found a math.stackexchange question about the balls and bins problem. The answer by Yuval Filmus cites the following paper, which gives even tighter bounds:
Martin Raab; Angelika Steger: Balls into Bins - A Simple and Tight Analysis. 2nd Intl Wkshp on Randomization and Approximation Techniques in Computer Science, pp. 159-170, 1998.
Finally, there is apparently a branch of statistics called extreme value theory culminating in the Fisher-Tippett-Gnedenko theorem, which, I think, gives expected maxima for a huge variety of distributions.
I found a Stackoverflow question about chained hash tables. The answer by btilly argues that for fixed ratio $n/m$ the expected worst case longest chain is $\Theta(\log n/ \log\log n)$. (Since $n/m$ is fixed, $m$ doesn't need to come into the answer.)
And the answer by templatetypedef references the following paper for a more detailed analysis:
Gaston H. Gonnet: Expected Length of the Longest Probe Sequence in Hash Code Searching.
J. ACM_ 28(2):289-304, 1981.
[Free techreport version at University of Waterloo]
Slightly more generally I found a math.stackexchange question about the balls and bins problem. The answer by Yuval Filmus cites the following paper, which gives even tighter bounds:
Martin Raab; Angelika Steger: Balls into Bins - A Simple and Tight Analysis. 2nd Intl Wkshp on Randomization and Approximation Techniques in Computer Science, pp. 159-170, 1998.
Finally, there is apparently a branch of statistics called extreme value theory culminating in the Fisher-Tippett-Gnedenko theorem, which, I think, gives expected maxima for a huge variety of distributions.
Context
StackExchange Computer Science Q#12830, answer score: 3
Revisions (0)
No revisions yet.