patternMinor
Find the median of a list of sorted arrays
Viewed 0 times
thearraysfindsortedlistmedian
Problem
Input:
A set of $\ell$ arrays $A_i$ (of numbers).
The elements within each array are in sorted order, but the set of arrays is not necessarily sorted. The arrays are not necessarily the same size. The total number of elements is $n$.
Output:
The $k$th smallest element out of all elements in the input.
What's the most efficient algorithm for this problem?
Is it possible, for example to achieve a running time of $O(\ell + \log n)$?
A set of $\ell$ arrays $A_i$ (of numbers).
The elements within each array are in sorted order, but the set of arrays is not necessarily sorted. The arrays are not necessarily the same size. The total number of elements is $n$.
Output:
The $k$th smallest element out of all elements in the input.
What's the most efficient algorithm for this problem?
Is it possible, for example to achieve a running time of $O(\ell + \log n)$?
Solution
You can do it in $O(l + k \text{ log } l)$ time and $O(l)$ extra space as follows:
If you replace the binary heap with a Fibonacci heap, I think this gets you down to amortized $O(l + k)$ time, but in practice it'll be slower than the binary heap unless $l$ is HUGE.
I suspect that the Fibonacci heap bound is optimal, because intuitively you're going to have to inspect at least $k$ elements to find the $k$th smallest one, and you're going to have to inspect at least one element from each of the $l$ arrays since you don't know how they're sorted, which immediately gives a lower bound of $\Omega(\text{max}(k, l)) = \Omega(k + l)$.
- Build a binary heap with one entry for each of the arrays. The key for entry $i$ is the smallest element in array $A_i$. This takes $O(l)$ time.
- Select the smallest entry from the heap and remove it (taking $O(\text{log } l$) time). Add that entry back to the heap using the next smallest entry in the relevant array as its key (again $O(\text{log } l)$ time).
- Do the previous step $k$ times. The last element you remove from the heap is your answer.
If you replace the binary heap with a Fibonacci heap, I think this gets you down to amortized $O(l + k)$ time, but in practice it'll be slower than the binary heap unless $l$ is HUGE.
I suspect that the Fibonacci heap bound is optimal, because intuitively you're going to have to inspect at least $k$ elements to find the $k$th smallest one, and you're going to have to inspect at least one element from each of the $l$ arrays since you don't know how they're sorted, which immediately gives a lower bound of $\Omega(\text{max}(k, l)) = \Omega(k + l)$.
Context
StackExchange Computer Science Q#9497, answer score: 5
Revisions (0)
No revisions yet.