HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

Would using the mean as pivot speed up quicksort?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
thepivotmeanquicksortwouldusingspeed

Problem

Somehow I thought about quicksort last night and was reading about it on Wikipedia. The interesting part for me was:
'If we could consistently choose a pivot from the middle 50 percent, we would only have to split the list at most $\log_{4/3} n$.
The choice of the pivot seems to be one possible problem of quicksort that can lead to $O(n^2)$ behavior.

My idea was: If in each step one would use the mean of the partition as pivot, this could increase the speed significantly.
Especially after a few steps, when outliers are in their own division of the list, the mean and median should be very close to each other (once again, looking at large lists).
The additional time during each step to calculate the mean should be $n$. Therefore:

Quicksort estimated time: $nA\log_{4/3} n$

Quicksort_mean estimated time: $2nA\log_{5/3} n$

(5/3 is most likely a conservative estimation by me, could as well be closer to 2, as subsets should quickly be without outliers).
So, starting at around 10,000 entries Quicksort_mean would be (on average) faster than Quicksort. Furthermore, it would never risk to be $O(n^2)$, as it is bound to not take the minimal or maximal element of the stack.

My main question is: Did I miss anything? I have to admit, I never implemented quicksort myself, so I might miss out on other parts of the whole thing (storage, etc.)

Solution

Using the mean for your partition does not prevent the $\Omega(n^2)$ worst-case behavior. It occurs when the input list is exponentially increasing. Consider the input:

$1,n^2,n^3,\ldots,n^n$

The mean of this set is (asymptotically) $n^{n-1}$ so you obtain the worst partition possible. This is a bit of a cheat considering that storing the list takes $\Omega(n^2)$ space if the numbers are represented as integers. But if you are sorting floating-point numbers then this scenario is concievable.

However, it is possible to calculate the median of a set (or any other order statistic for that matter) in $O(n)$ time so if you really care about run time guarantees for quick sort you should use that rather than the mean.

However, in all practical scenarios the additional cost of computing the mean/median is so large that picking a random pivot almost always is faster.

Context

StackExchange Computer Science Q#33599, answer score: 10

Revisions (0)

No revisions yet.