HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Find the kth smallest element in an unsorted array of non-negative integers

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
smallestfindthenonarrayunsortedelementkthintegersnegative

Problem

I need to find the kth smallest element in an unsorted array of non-negative integer.

kth smallest element is the minimum possible n such that there are at least kth elements <= n.

The rub here is that the array is read only so it can't be modified. Also i have to do it in constant space.

Can anybody help come up with the most optimal solution ?

Solution

There is a simple randomized algorithm running in expected $O(n\log n)$ time. The algorithm is a variant of the better known algorithm QuickSelect.

The algorithm maintains a pair of values $a \leq b$, initially $-\infty$ and $+\infty$ (or, the minimum and maximum values of the array). At each round, it chooses a uniformly random element $c$ in the range $[a,b]$ from the array, and computes its order statistics, i.e., the value $\ell$ such that $c$ is the $\ell$th smallest element in the array.

  • If $\ell = k$, then we return $c$.



  • If $\ell > k$, we replace $b$ with $c$.



  • If $\ell



If $a = b$ then we return $a$. Otherwise, we continue for one more iteration.

Each iteration takes time $O(n)$, so in order to estimate the running time of the algorithm, we need to estimate the number of rounds. We will keep track of the number of "live" elements, initially $n$. The algorithm ends when the number of live elements gets down to 1 (or even sooner, if there are repeated elements).

Let us denote by $n_t$ the number of elements after $t$ rounds, so that $n_0 = n$. The exact distribution of $n_{t+1}$ given $n_t$ depends on the location of the $k$th order statistics among the live elements. Let us suppose that the $k$th order statistics is the $\ell$th live element. Without loss of generality, $\ell \leq (n_t+1)/2$. Suppose that we choose the $r$th order statistic, so $r$ is uniform over $1,\ldots,n_t$. Then:

  • If $r = 1,\ldots,\ell-1$ then $n_{t+1} = n_t - (r-1)$.



  • If $r = \ell$ then $n_{t+1} = 0$.



  • If $r = \ell+1,\ldots,n_t$ then $n_{t+1} = r$.



In total, the expected value of $n_{t+1}$ is
$$
\frac{1}{n_t} [(n_t + \cdots + n_t-\ell+2) + (\ell+1 + \cdots + n_t)].
$$
The number of summands is constant, and the worst case is when $\ell = (n_t+1)/2$, in which case the expected value of $n_{t+1}$ is
$$
\frac{(\frac{n_t+1}{2}+n_t)\frac{n_t-1}{2}}{n_t} \leq \frac{3}{4}n_t.
$$
Induction shows that $\mathbb{E}[n_t] \leq (3/4)^t n$, and so the expectation drops below 1 for $t = O(\log n)$.

While this doesn't quite show that the expected number of iterations is $O(\log n)$, more refined arguments do, and they moreover show that the number of iterations is $O(\log n)$ with high probability.

Here are some hints on how to implement the algorithm. The crucial step is to choose a uniformly random element $c$ in the range $[a,b]$ from the array. There are at least two ways to go about it, both taking $O(n)$:

Reservoir sampling

-
Initialize $N = 0$ and $X = \bot$.

-
Go over all elements of the array. For each element $x$, if $a \leq x \leq b$:

  • Increment $N$.



  • Replace $X$ with $x$ with probability $1/N$.



-
Return $X$.

Linear scan

-
Count the number of elements in the array between $a$ and $b$ — say the answer is $N$.

-
Draw a random integer $i$ between 1 and $N$.

-
Go over the array again, and return the $i$th element between $a$ and $b$.

Context

StackExchange Computer Science Q#93694, answer score: 4

Revisions (0)

No revisions yet.