HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Confusion about the definition of the average-case running time of algorithms

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
definitioncasetheaveragetimealgorithmsrunningaboutconfusion

Problem

In this lecture note,


The average-case running time is defined by the expected value, over all inputs $X$ of a certain size, of the algorithm's running time for $X$:
$$T_{\text{average-case}}(n) = E_{|X| = n}[T(X)] = \sum_{|X| = n} T(X) \cdot Pr[X].$$

This wiki article (Quicksort) gives an average-case time complexity analysis for Quicksort:


When the input is a random permutation, the rank of the pivot is uniform random from $0$ to $n − 1$. Then the resulting parts of the partition have sizes $i$ and $n − i − 1$, and $i$ is uniform random from $0$ to $n − 1$. So, averaging over all possible splits and noting that the number of comparisons for the partition is $n − 1$, the average number of comparisons over all permutations of the input sequence can be estimated accurately by solving the recurrence relation:
$$C(n) = n - 1 + \frac{1}{n} \sum_{i=0}^{n-1} (C(i)+C(n-i-1)).$$

My confusion is how does the average-case analysis for Quicksort fit the definition above? First, I would expect a term $\frac{1}{n!}$ (which is $Pr[X]$) in the recurrence. But, it is not the case. Second, the analysis above is on a random permutation and the averaging is done over all possible splits. How is this related to averaging over all possible inputs? Do I need another definition for average-case running time?

Solution

The definition is a special case of a more general notion. Given probability distributions $\mu_1,\mu_2,\ldots$ on inputs, the average running time (with respect to the $\mu_i$) is defined as
$$
\newcommand{\Tavg}{T_{\mathit{avg}}}
\newcommand{\EE}{\mathbb{E}}
\Tavg(n) = \sideset{\EE}{}{}_{X \sim \mu_n} [T(X)].
$$
Usually $\mu_n$ is supported on inputs of "length" $n$, though length is not necessarily measured in bits or in machine words.

When analyzing comparison-based sorting algorithms, usually $\mu_n$ is chosen as the uniform distribution over all permutations of the array $1,2,\ldots,n$ (or any other array consisting of $n$ distinct comparable elements). The analysis of quicksort uses these distributions.

Now regarding the recursive formula you mention. Given an input array $X$, let $I$ be the rank of the pivot; if $X$ is a random variable, then so is $I$. The formula in Wikipedia is
$$
\Tavg(n) = \sum_{i=0}^{n-1} \Pr[I=i] \EE[T(X)|I=i] = \sum_{i=0}^{n-1} \frac{1}{n} \left[n-1 + \Tavg(i) + \Tavg(n-1-i)\right].
$$
Rearranging gives you the quoted formula.

Context

StackExchange Computer Science Q#72141, answer score: 4

Revisions (0)

No revisions yet.