patternMinor
Sorting "almost sorted" array in $O(n\log\log n)$
Viewed 0 times
sortinglogalmostarraysorted
Problem
Given array $A$ of length $n$, we call it almost sorted if there are at most $\log n$ indices satisfying $A[i] > A[i+1]$.
Find an algorithm that sorts the array in $O(n\log\log n)$.
My attempt:
At this point I am stuck.
Find an algorithm that sorts the array in $O(n\log\log n)$.
My attempt:
- Create an array $B$ of size $\log n + 1$.
- Go through the array $A$, recognize the $\log n$ pairs, and insert them into $B$.
- Sort $B$ using insertion sort in time $O(\log^2 n)$.
At this point I am stuck.
Solution
Suppose that $A$ is an array with $m$ indices satisfying $A[i] > A[i+1]$. You can find these indices in $O(n)$. These $m$ indices split $A$ into $m+1$ nondecreasing arrays $B_1,\ldots,B_{m+1}$ of total length $n$. We now merge them according to the following strategy: at each step, take the two shortest arrays, and merge them. You can implement the choice mechanism in $O(m\log m) = O(n\log m)$ using a heap.
When merging two arrays of length $a,b$, the running time is $O(a+b)$. Hence we would like to understand the total sum $S$ of $a+b$, where $(a,b)$ goes over all $m$ pairs of arrays being merged. To this end, we consider a merge tree, which is formed in the following way. We start with $m+1$ vertices corresponding to $B_1,\ldots,B_{m+1}$. When two arrays corresponding to vertices $x,y$ are merged, we create a new vertex with $x,y$ its only children. You can check that
$$
S = \sum_{i=1}^{m+1} |B_i| \mathrm{depth}(B_i).
$$
Consider now a probability distribution $X$ on $[m+1]$ with $\Pr[X=i] = |B_i|/n$. Then $S/n$ is the average codeword length of an optimal prefix code for $X$ (this is because we're essentially running Huffman's algorithm). Therefore $S/n < \log m + 1$, showing that the merging steps take $O(n\log m)$ time in total.
Summarizing, the algorithm runs in time $O(n\log m)$.
(Strictly speaking, to handle the cases $m=1$ and $m=0$, we should replace $m$ with $m+2$.)
When merging two arrays of length $a,b$, the running time is $O(a+b)$. Hence we would like to understand the total sum $S$ of $a+b$, where $(a,b)$ goes over all $m$ pairs of arrays being merged. To this end, we consider a merge tree, which is formed in the following way. We start with $m+1$ vertices corresponding to $B_1,\ldots,B_{m+1}$. When two arrays corresponding to vertices $x,y$ are merged, we create a new vertex with $x,y$ its only children. You can check that
$$
S = \sum_{i=1}^{m+1} |B_i| \mathrm{depth}(B_i).
$$
Consider now a probability distribution $X$ on $[m+1]$ with $\Pr[X=i] = |B_i|/n$. Then $S/n$ is the average codeword length of an optimal prefix code for $X$ (this is because we're essentially running Huffman's algorithm). Therefore $S/n < \log m + 1$, showing that the merging steps take $O(n\log m)$ time in total.
Summarizing, the algorithm runs in time $O(n\log m)$.
(Strictly speaking, to handle the cases $m=1$ and $m=0$, we should replace $m$ with $m+2$.)
Context
StackExchange Computer Science Q#104603, answer score: 6
Revisions (0)
No revisions yet.