gotchaMinor
Why does this mergesort variant not do Θ(n) comparisons on average?
Viewed 0 times
thiswhyvariantmergesortaveragedoesnotcomparisons
Problem
A comparison sort cannot require fewer than $\Theta (n\log n)$ comparisons on average. However, consider this sorting algorithm:
(
Instead of simply splitting the array in the middle like mergesort, it splits it so that one resulting array doesn’t need to be recursively sorted. Let $n$ be the length of the array to be sorted. Applying the Master Theorem gives us
$$\begin{align*}
T(n) &= T(n / b) + \mathrm{splitComparisons}(n) + \mathrm{mergeComparisons}(n) \\
&= T(n / b) + (n - 1) + n \\
&= T(n / b) + 2n - 1\,,
\end{align*}$$
so $f(n) = 2n - 1$ and $a = c = 1$ in the statement of the Master Theorem.
$b$ is one over the probability that an element is greater than the next element and will go into the array to be recursively sorted. For example, if there's a 25% chance that
But that can’t be true, so the Master Theorem isn't applicable for some reason; I suspect because $b$ isn't constant, but I don’t know how to prove that. The worst case of the sorting algorithm obviously requires a quadratic number of comparisons and the best case linear, so by analogy with bubblesort, insertion sort, etc, I’m guessing this algorithm also makes a quadratic number of com
sort(array):
if length(array) array[i + 1]:
push(unsorted, pop(array, i + 1))
else:
i ← i + 1
return merge(array, sort(unsorted))(
push(array, element) puts the new element at the end of the array and increases the array’s length by 1. pop(array, index) removes the element at that index from the array, moving all the elements at greater indices and decrementing the array’s length, and returns the removed element. merge is the same as in mergesort.)Instead of simply splitting the array in the middle like mergesort, it splits it so that one resulting array doesn’t need to be recursively sorted. Let $n$ be the length of the array to be sorted. Applying the Master Theorem gives us
$$\begin{align*}
T(n) &= T(n / b) + \mathrm{splitComparisons}(n) + \mathrm{mergeComparisons}(n) \\
&= T(n / b) + (n - 1) + n \\
&= T(n / b) + 2n - 1\,,
\end{align*}$$
so $f(n) = 2n - 1$ and $a = c = 1$ in the statement of the Master Theorem.
$b$ is one over the probability that an element is greater than the next element and will go into the array to be recursively sorted. For example, if there's a 25% chance that
array[i] > array[i + 1] (for all i), $b = 4$. $b$ is clearly greater than $1$, since the length of unsorted array grows smaller with every recursive call, so taking the logarithm with base $b$ of $1$ will always give us $0$, which is less than $c$. Then $T(n) = \Theta(f(n)) = \Theta(n)$.But that can’t be true, so the Master Theorem isn't applicable for some reason; I suspect because $b$ isn't constant, but I don’t know how to prove that. The worst case of the sorting algorithm obviously requires a quadratic number of comparisons and the best case linear, so by analogy with bubblesort, insertion sort, etc, I’m guessing this algorithm also makes a quadratic number of com
Solution
Since you don't do any reordering while splitting, the length of
In particular, there is no $b$ so that your ansatz describes the actual number of comparisions.
Assuming that the probability distribution of the length of the longest increasing subsequence survives your splitting, a proper ansatz would be of the form
$\qquad\displaystyle C(n) = C(n - 2\sqrt{n}) + \Theta(n)$
which does not seem to solve² to $C \in O(n)$.
In fact, it's even worse since you keep the "first" increasing subsequence, not the longest (in general); consider for instance $[1,n,2,3,4,\dots,n-1]$; you keep $[1,n]$ and recurse on $[2,3,\dots,n-1]$.
array after the while loop can not be larger than the length of the longest increasing subsequence in the input. Since that one is on average about $2\sqrt{n}$ elements long¹, you keep too many elements in unsorted. In particular, there is no $b$ so that your ansatz describes the actual number of comparisions.
Assuming that the probability distribution of the length of the longest increasing subsequence survives your splitting, a proper ansatz would be of the form
$\qquad\displaystyle C(n) = C(n - 2\sqrt{n}) + \Theta(n)$
which does not seem to solve² to $C \in O(n)$.
In fact, it's even worse since you keep the "first" increasing subsequence, not the longest (in general); consider for instance $[1,n,2,3,4,\dots,n-1]$; you keep $[1,n]$ and recurse on $[2,3,\dots,n-1]$.
- On the distribution of the length of the longest increasing subsequence of random permutations by J. Baik, P. Deift and K. Johansson (1999) [via Wikipedia]
- I have not solved it explicitly, but plots suggest as much.
Context
StackExchange Computer Science Q#41180, answer score: 4
Revisions (0)
No revisions yet.