patternCritical
Find median of unsorted array in $O(n)$ time
Viewed 0 times
arrayunsortedtimefindmedian
Problem
To find the median of an unsorted array, we can make a min-heap in $O(n\log n)$ time for $n$ elements, and then we can extract one by one $n/2$ elements to get the median. But this approach would take $O(n \log n)$ time.
Can we do the same by some method in $O(n)$ time? If we can, then how?
Can we do the same by some method in $O(n)$ time? If we can, then how?
Solution
This is a special case of a selection algorithm that can find the $k$th smallest element of an array with $k$ is the half of the size of the array. There is an implementation that is linear in the worst case.
Generic selection algorithm
First let's see an algorithm
The function
This is $O(n)$ in average but $O(n^2)$ in the worst case.
Linear worst case: the median-of-medians algorithm
A better pivot is the median of all the medians of sub arrays of
This guarantees $O(n)$ in all cases. It is not that obvious. These powerpoint slides are helpful both at explaining the algorithm and the complexity.
Note that most of the time using a random pivot is faster.
Generic selection algorithm
First let's see an algorithm
find-kth that finds the $k$th smallest element of an array:find-kth(A, k)
pivot = random element of A
(L, R) = split(A, pivot)
if k = |L|+1, return pivot
if k ≤ |L| , return find-kth(L, k)
if k > |L|+1, return find-kth(R, k-(|L|+1))The function
split(A, pivot) returns L,R such that all elements in R are greater than pivot and L all the others (minus one occurrence of pivot). Then all is done recursively.This is $O(n)$ in average but $O(n^2)$ in the worst case.
Linear worst case: the median-of-medians algorithm
A better pivot is the median of all the medians of sub arrays of
A of size 5, by using calling the procedure on the array of these medians.find-kth(A, k)
B = [median(A[1], .., A[5]), median(A[6], .., A[10]), ..]
pivot = find-kth(B, |B|/2)
...This guarantees $O(n)$ in all cases. It is not that obvious. These powerpoint slides are helpful both at explaining the algorithm and the complexity.
Note that most of the time using a random pivot is faster.
Code Snippets
find-kth(A, k)
pivot = random element of A
(L, R) = split(A, pivot)
if k = |L|+1, return pivot
if k ≤ |L| , return find-kth(L, k)
if k > |L|+1, return find-kth(R, k-(|L|+1))find-kth(A, k)
B = [median(A[1], .., A[5]), median(A[6], .., A[10]), ..]
pivot = find-kth(B, |B|/2)
...Context
StackExchange Computer Science Q#1914, answer score: 74
Revisions (0)
No revisions yet.