patternMinor
Find the kth smallest element in an unsorted array of non-negative integers
Viewed 0 times
smallestfindthenonarrayunsortedelementkthintegersnegative
Problem
I need to find the kth smallest element in an unsorted array of non-negative integer.
kth smallest element is the minimum possible n such that there are at least kth elements <= n.
The rub here is that the array is read only so it can't be modified. Also i have to do it in constant space.
Can anybody help come up with the most optimal solution ?
kth smallest element is the minimum possible n such that there are at least kth elements <= n.
The rub here is that the array is read only so it can't be modified. Also i have to do it in constant space.
Can anybody help come up with the most optimal solution ?
Solution
There is a simple randomized algorithm running in expected $O(n\log n)$ time. The algorithm is a variant of the better known algorithm QuickSelect.
The algorithm maintains a pair of values $a \leq b$, initially $-\infty$ and $+\infty$ (or, the minimum and maximum values of the array). At each round, it chooses a uniformly random element $c$ in the range $[a,b]$ from the array, and computes its order statistics, i.e., the value $\ell$ such that $c$ is the $\ell$th smallest element in the array.
If $a = b$ then we return $a$. Otherwise, we continue for one more iteration.
Each iteration takes time $O(n)$, so in order to estimate the running time of the algorithm, we need to estimate the number of rounds. We will keep track of the number of "live" elements, initially $n$. The algorithm ends when the number of live elements gets down to 1 (or even sooner, if there are repeated elements).
Let us denote by $n_t$ the number of elements after $t$ rounds, so that $n_0 = n$. The exact distribution of $n_{t+1}$ given $n_t$ depends on the location of the $k$th order statistics among the live elements. Let us suppose that the $k$th order statistics is the $\ell$th live element. Without loss of generality, $\ell \leq (n_t+1)/2$. Suppose that we choose the $r$th order statistic, so $r$ is uniform over $1,\ldots,n_t$. Then:
In total, the expected value of $n_{t+1}$ is
$$
\frac{1}{n_t} [(n_t + \cdots + n_t-\ell+2) + (\ell+1 + \cdots + n_t)].
$$
The number of summands is constant, and the worst case is when $\ell = (n_t+1)/2$, in which case the expected value of $n_{t+1}$ is
$$
\frac{(\frac{n_t+1}{2}+n_t)\frac{n_t-1}{2}}{n_t} \leq \frac{3}{4}n_t.
$$
Induction shows that $\mathbb{E}[n_t] \leq (3/4)^t n$, and so the expectation drops below 1 for $t = O(\log n)$.
While this doesn't quite show that the expected number of iterations is $O(\log n)$, more refined arguments do, and they moreover show that the number of iterations is $O(\log n)$ with high probability.
Here are some hints on how to implement the algorithm. The crucial step is to choose a uniformly random element $c$ in the range $[a,b]$ from the array. There are at least two ways to go about it, both taking $O(n)$:
Reservoir sampling
-
Initialize $N = 0$ and $X = \bot$.
-
Go over all elements of the array. For each element $x$, if $a \leq x \leq b$:
-
Return $X$.
Linear scan
-
Count the number of elements in the array between $a$ and $b$ — say the answer is $N$.
-
Draw a random integer $i$ between 1 and $N$.
-
Go over the array again, and return the $i$th element between $a$ and $b$.
The algorithm maintains a pair of values $a \leq b$, initially $-\infty$ and $+\infty$ (or, the minimum and maximum values of the array). At each round, it chooses a uniformly random element $c$ in the range $[a,b]$ from the array, and computes its order statistics, i.e., the value $\ell$ such that $c$ is the $\ell$th smallest element in the array.
- If $\ell = k$, then we return $c$.
- If $\ell > k$, we replace $b$ with $c$.
- If $\ell
If $a = b$ then we return $a$. Otherwise, we continue for one more iteration.
Each iteration takes time $O(n)$, so in order to estimate the running time of the algorithm, we need to estimate the number of rounds. We will keep track of the number of "live" elements, initially $n$. The algorithm ends when the number of live elements gets down to 1 (or even sooner, if there are repeated elements).
Let us denote by $n_t$ the number of elements after $t$ rounds, so that $n_0 = n$. The exact distribution of $n_{t+1}$ given $n_t$ depends on the location of the $k$th order statistics among the live elements. Let us suppose that the $k$th order statistics is the $\ell$th live element. Without loss of generality, $\ell \leq (n_t+1)/2$. Suppose that we choose the $r$th order statistic, so $r$ is uniform over $1,\ldots,n_t$. Then:
- If $r = 1,\ldots,\ell-1$ then $n_{t+1} = n_t - (r-1)$.
- If $r = \ell$ then $n_{t+1} = 0$.
- If $r = \ell+1,\ldots,n_t$ then $n_{t+1} = r$.
In total, the expected value of $n_{t+1}$ is
$$
\frac{1}{n_t} [(n_t + \cdots + n_t-\ell+2) + (\ell+1 + \cdots + n_t)].
$$
The number of summands is constant, and the worst case is when $\ell = (n_t+1)/2$, in which case the expected value of $n_{t+1}$ is
$$
\frac{(\frac{n_t+1}{2}+n_t)\frac{n_t-1}{2}}{n_t} \leq \frac{3}{4}n_t.
$$
Induction shows that $\mathbb{E}[n_t] \leq (3/4)^t n$, and so the expectation drops below 1 for $t = O(\log n)$.
While this doesn't quite show that the expected number of iterations is $O(\log n)$, more refined arguments do, and they moreover show that the number of iterations is $O(\log n)$ with high probability.
Here are some hints on how to implement the algorithm. The crucial step is to choose a uniformly random element $c$ in the range $[a,b]$ from the array. There are at least two ways to go about it, both taking $O(n)$:
Reservoir sampling
-
Initialize $N = 0$ and $X = \bot$.
-
Go over all elements of the array. For each element $x$, if $a \leq x \leq b$:
- Increment $N$.
- Replace $X$ with $x$ with probability $1/N$.
-
Return $X$.
Linear scan
-
Count the number of elements in the array between $a$ and $b$ — say the answer is $N$.
-
Draw a random integer $i$ between 1 and $N$.
-
Go over the array again, and return the $i$th element between $a$ and $b$.
Context
StackExchange Computer Science Q#93694, answer score: 4
Revisions (0)
No revisions yet.