gotchaMinor

Why does this mergesort variant not do Θ(n) comparisons on average?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

sorting algorithm-analysis average-case cs stackoverflow runtime-analysis

thiswhyvariantmergesortaveragedoesnotcomparisons

Problem

A comparison sort cannot require fewer than $\Theta (n\log n)$ comparisons on average. However, consider this sorting algorithm:

sort(array):
    if length(array)  array[i + 1]:
            push(unsorted, pop(array, i + 1))
        else:
            i ← i + 1
    return merge(array, sort(unsorted))

(push(array, element) puts the new element at the end of the array and increases the array’s length by 1. pop(array, index) removes the element at that index from the array, moving all the elements at greater indices and decrementing the array’s length, and returns the removed element. merge is the same as in mergesort.)

Instead of simply splitting the array in the middle like mergesort, it splits it so that one resulting array doesn’t need to be recursively sorted. Let $n$ be the length of the array to be sorted. Applying the Master Theorem gives us

$$\begin{align*}
T(n) &= T(n / b) + \mathrm{splitComparisons}(n) + \mathrm{mergeComparisons}(n) \\
&= T(n / b) + (n - 1) + n \\
&= T(n / b) + 2n - 1\,,
\end{align*}$$

so $f(n) = 2n - 1$ and $a = c = 1$ in the statement of the Master Theorem.

$b$ is one over the probability that an element is greater than the next element and will go into the array to be recursively sorted. For example, if there's a 25% chance that array[i] > array[i + 1] (for all i), $b = 4$. $b$ is clearly greater than $1$, since the length of unsorted array grows smaller with every recursive call, so taking the logarithm with base $b$ of $1$ will always give us $0$, which is less than $c$. Then $T(n) = \Theta(f(n)) = \Theta(n)$.

But that can’t be true, so the Master Theorem isn't applicable for some reason; I suspect because $b$ isn't constant, but I don’t know how to prove that. The worst case of the sorting algorithm obviously requires a quadratic number of comparisons and the best case linear, so by analogy with bubblesort, insertion sort, etc, I’m guessing this algorithm also makes a quadratic number of com

Solution

Since you don't do any reordering while splitting, the length of array after the while loop can not be larger than the length of the longest increasing subsequence in the input. Since that one is on average about $2\sqrt{n}$ elements long¹, you keep too many elements in unsorted.

In particular, there is no $b$ so that your ansatz describes the actual number of comparisions.

Assuming that the probability distribution of the length of the longest increasing subsequence survives your splitting, a proper ansatz would be of the form

$\qquad\displaystyle C(n) = C(n - 2\sqrt{n}) + \Theta(n)$

which does not seem to solve² to $C \in O(n)$.

In fact, it's even worse since you keep the "first" increasing subsequence, not the longest (in general); consider for instance $[1,n,2,3,4,\dots,n-1]$; you keep $[1,n]$ and recurse on $[2,3,\dots,n-1]$.

On the distribution of the length of the longest increasing subsequence of random permutations by J. Baik, P. Deift and K. Johansson (1999) [via Wikipedia]

I have not solved it explicitly, but plots suggest as much.

Context

StackExchange Computer Science Q#41180, answer score: 4

Revisions (0)

No revisions yet.