HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Complexity of k-way merge with log n arrays of size n / log n

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
arrayslogwithmergesizewaycomplexity

Problem

I'm designing an in-place sorting algorithm to sort a collection of size $n$. Near the end of the sorting, it may end up with with $\log{n}$ sorted sub-sequences of size $\frac{n}{\log{n}}$. At this point, I use a very simple $k$-way merge algorithm that merges the sub-sequences pairwise until everything has been merged. This very case apparently represents the complexity bottleneck of the algorithm.

I'm repeatedly using the C++ function std::inplace_merge to perform the merging operations, which performs $Θ(N \log{N})$ comparisons most of the time, but may perform up to $O(N \log^2{N})$ comparisons if there isn't enough extra memory available, where $N$ corresponds to the sum of the sizes of the sequences to merge (in our case $2\frac{n}{\log{n}}$).

Taking all of that information into account, I can tell that repeatedly merging the sub-sequences pairwise until everything has been merged takes the following number of comparisons, step by step:

  • Step 1: $O(\frac{\log{n}}{2} \frac{2n}{\log{n}} \log^2{\frac{2n}{\log{n}}}) = O(n * \log^2{\frac{2n}{\log{n}}})$



  • Step 2: $O(\frac{\log{n}}{4} \frac{4n}{\log{n}} \log^2{\frac{4n}{\log{n}}}) = O(n * \log^2{\frac{4n}{\log{n}}})$



  • Step 3: $O(\frac{\log{n}}{8} \frac{8n}{\log{n}} \log^2{\frac{8n}{\log{n}}}) = O(n * \log^2{\frac{8n}{\log{n}}})$



  • $\ldots$



If I'm not mistaken, there will be $\log{k}$ steps where $k$ is number of sorted sub-sequences in the original collection, which is is $\log{n}$, hence there will be $\log{\log{n}}$ steps. This should yield an overall complexity along the lines of $O(\log{\log{n}} n \log^2{\frac{?n}{\log{n}}})$.

Now, to be perfectly honest, I've got no idea how to simplify this expression further, and since I'm describing a sorting algorithm, I would like to know whether it is more along the lines of $O(n \log{n})$ or $O(n \log^2{n})$.

Having computed the complexity of the other two steps of the algorithm, it appears that none of them performs more than $O(n \log{\fr

Solution

First of all, notice that whatever you were doing, the complexity up to this point can not possibly be $o(n \log n)$. If that were true, we could merge each sorted subsequence in time $O(n \log \log n)$, violating the well known bound of $\Omega(n \log n)$ for the worst case of a general sorting algorithm. In fact, $$\Theta\left(n \log \frac{n}{\log n} \right) = \Theta(n \log n - n \log \log n) = \Theta (n \log n)$$

Now, let's analyze the complexity of what you are proposing. In order to visualize all the merging that's going on, imagine inserting each sorted subsequence into a FIFO queue. At each step, we take the first two sequences of the queue, merge them, and put the result at the back of the queue. We take note of the sequence that is initially in front of the queue, and define a round as the set of all merging operations that happen between two subsequent appearances of the initial sequence at the front of the queue.

Observe that each round splits in half the queue size: let's say there are $r$ rounds, with $r \in \Theta(\log \log n)$. Let $\alpha = \frac{n}{\log n}$. Before the $h-$th round begins, the queue contains $\frac{n}{2^h \alpha}$ subsequences, each of which has length $2^h \alpha$. Therefore, the complexity of round $h$ is:

$$
\Theta\left( \frac{n}{2^{h+1} \alpha} \left( 2^{h+1} \alpha \log (2^{h+1} \alpha) \right) \right) = \Theta \left( n \log (2^{h+1} \alpha) \right)
$$

This yields an overall complexity of:

$$
\Theta\left( \sum_{h=0}^{r-1} n \log (2^{h+1} \alpha) \right) =
\Theta\left( nr \log \alpha + n\sum_{h=0}^{r-1} \log 2^{h+1} \right ) =
\Theta\left( nr \log \alpha + nr^2 \right )
$$

Now, remembering that:

$$
\Theta \left( \log \alpha \right) = \Theta \left( \log n - \log \log n \right) = \Theta(\log n)
$$

We conclude:

$$
\Theta\left( nr \log \alpha + nr^2 \right ) = \Theta(n \log n \log \log n)
$$

Context

StackExchange Computer Science Q#68271, answer score: 3

Revisions (0)

No revisions yet.