HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMinor

Complexity of Merge Sort that splits in a random position

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
randommergesplitspositionthatsortcomplexity

Problem

I got a question that I don't fully understand:
Given a new algorithm to merge sort (AKA mergesort2) that instead of splitting the list in the middle it splits the list in a random number between $1$ and $n-1$.
Also, we know the probability of getting $1$ is $2/n$ and to every other number $1/n$.

I have tried again and again and came up with the regular when doing the "average case" $\Theta \ (n \log n)$ and of the merge sort algorithm and "worst case" $O (n^2)$ but I'm not sure about it.

Pseudo-Code:

MergeSort2(A,p,q)
     if(p=q) then return
     i = random number from (1...q-p)
     MergeSort2(A, p,   p+i-1)
     MergeSort2(A, p+i, q)
     Merge(A, p, p+i-1, q)


How I did the calculations of the $O (n^2)$ "worst case" when I split it to $1$ and to $n-1$ every time: (if it's correct)

$$T(n)= T(1)+T(n-1)+O(n)$$

When $T(1)$ is the single number, $T(n-1)$ is the rest list and $O(n)$ is the merge

$$\begin{align*}
T(n) &= (T(1)+T(n-2)+O(n)) + T(1)+O(n) & \text{second recursive call}\\
&= (T(n-3)+T(1)+O(n)) +2T(1)+2O(n) & \text{third recursive call}\\
& \vdots\\
T(n) &= T(n-k)+kT(1)+kO(n) & k^{th}\text{ recursive call}
\end{align*}$$

Now we know that $T(1) = O(1)$ as it doesn't do anything.

In order to get to our stop condition, we need the $T(n-k)$ will be $T(1)$.

So to do so we will take $k = n-1$.

Therefore we will get the following:

$$T(n) = T(1)+(n-1)O(1)+(n-1)O(n) \implies T(n) = O(n^2)$$

How do I calculate properly the complexity of the given algorithm?

Solution

For definiteness, let us suppose that the $T(1) = 1$ and that if the split is $k+(n-k)$ then the running time is $T(k) + T(n-k) + n$.
The recursion for the worst-case running time is therefore
$$
T(n) = \begin{cases} \max_{1 \leq k \leq n-1} T(k) + T(n-k) + n & \text{if } n > 1 \\ 1 & \text{if } n = 1. \end{cases}
$$

In order to calculate the worst-case complexity, we need both an upper bound and a lower bound. You already gave the lower bound:
$$
T(n) \geq T(1) + T(n-1) + n = T(n-1) + n+1.
$$
Iterating this easily yields $T(n) = \Omega(n^2)$.

In the other direction, let us prove that $T(n) \leq n^2$ for all $n$. The base case is clear. For the inductive case,
$$
T(k) + T(n-k) + n \leq k^2 + (n-k)^2 + n \leq 1 + (n-1)^2 + n = n^2 - n + 2 \leq n^2,
$$
since $n \geq 2$. It follows that $T(n) = \Theta(n^2)$.

(In fact, with some more work you should be able to show that $k=1$ is indeed (together with $k=n-1$) the worst choice.)

Let us now proceed to the average case running time (i.e., the expected running time), which satisfies the recurrence
$$
S(n) = \begin{cases} \frac{1}{n-1}\sum_{1 \leq k \leq n-1} [S(k) + S(n-k)] + n & \text{if } n > 1 \\ 1 & \text{if } n = 1. \end{cases}
$$
We can simplify the inductive case to
$$
S(n) = \frac{2}{n-1} \sum_{1 \leq k \leq n-1} S(k) + n.
$$
This implies that for $n \geq 2$,
$$
n S(n+1) = 2(S(1) + \cdots + S(n)) + (n+1)n, \\
(n-1) S(n) = 2(S(1) + \cdots + S(n-1)) + n(n-1).
$$
Subtracting gives
$$
nS(n+1) - (n-1)S(n) = 2S(n) + 2n \Longrightarrow nS(n+1) = (n+1)S(n) + 2n \Longrightarrow \\ \frac{S(n+1)}{n+1} = \frac{S(n)}{n} + \frac{2}{n+1}.
$$
This shows that
$$
\frac{S(n)}{n} = \frac{2}{n} + \frac{2}{n-1} + \cdots + \frac{2}{2} + \frac{S(1)}{1} = 2H_n - 1 = \Theta(\log n),
$$
and so $S(n) = \Theta(n\log n)$.

Context

StackExchange Computer Science Q#108456, answer score: 5

Revisions (0)

No revisions yet.