snippetMinor
Complexity of Merge Sort that splits in a random position
Viewed 0 times
randommergesplitspositionthatsortcomplexity
Problem
I got a question that I don't fully understand:
Given a new algorithm to merge sort (AKA mergesort2) that instead of splitting the list in the middle it splits the list in a random number between $1$ and $n-1$.
Also, we know the probability of getting $1$ is $2/n$ and to every other number $1/n$.
I have tried again and again and came up with the regular when doing the "average case" $\Theta \ (n \log n)$ and of the merge sort algorithm and "worst case" $O (n^2)$ but I'm not sure about it.
Pseudo-Code:
How I did the calculations of the $O (n^2)$ "worst case" when I split it to $1$ and to $n-1$ every time: (if it's correct)
$$T(n)= T(1)+T(n-1)+O(n)$$
When $T(1)$ is the single number, $T(n-1)$ is the rest list and $O(n)$ is the merge
$$\begin{align*}
T(n) &= (T(1)+T(n-2)+O(n)) + T(1)+O(n) & \text{second recursive call}\\
&= (T(n-3)+T(1)+O(n)) +2T(1)+2O(n) & \text{third recursive call}\\
& \vdots\\
T(n) &= T(n-k)+kT(1)+kO(n) & k^{th}\text{ recursive call}
\end{align*}$$
Now we know that $T(1) = O(1)$ as it doesn't do anything.
In order to get to our stop condition, we need the $T(n-k)$ will be $T(1)$.
So to do so we will take $k = n-1$.
Therefore we will get the following:
$$T(n) = T(1)+(n-1)O(1)+(n-1)O(n) \implies T(n) = O(n^2)$$
How do I calculate properly the complexity of the given algorithm?
Given a new algorithm to merge sort (AKA mergesort2) that instead of splitting the list in the middle it splits the list in a random number between $1$ and $n-1$.
Also, we know the probability of getting $1$ is $2/n$ and to every other number $1/n$.
I have tried again and again and came up with the regular when doing the "average case" $\Theta \ (n \log n)$ and of the merge sort algorithm and "worst case" $O (n^2)$ but I'm not sure about it.
Pseudo-Code:
MergeSort2(A,p,q)
if(p=q) then return
i = random number from (1...q-p)
MergeSort2(A, p, p+i-1)
MergeSort2(A, p+i, q)
Merge(A, p, p+i-1, q)How I did the calculations of the $O (n^2)$ "worst case" when I split it to $1$ and to $n-1$ every time: (if it's correct)
$$T(n)= T(1)+T(n-1)+O(n)$$
When $T(1)$ is the single number, $T(n-1)$ is the rest list and $O(n)$ is the merge
$$\begin{align*}
T(n) &= (T(1)+T(n-2)+O(n)) + T(1)+O(n) & \text{second recursive call}\\
&= (T(n-3)+T(1)+O(n)) +2T(1)+2O(n) & \text{third recursive call}\\
& \vdots\\
T(n) &= T(n-k)+kT(1)+kO(n) & k^{th}\text{ recursive call}
\end{align*}$$
Now we know that $T(1) = O(1)$ as it doesn't do anything.
In order to get to our stop condition, we need the $T(n-k)$ will be $T(1)$.
So to do so we will take $k = n-1$.
Therefore we will get the following:
$$T(n) = T(1)+(n-1)O(1)+(n-1)O(n) \implies T(n) = O(n^2)$$
How do I calculate properly the complexity of the given algorithm?
Solution
For definiteness, let us suppose that the $T(1) = 1$ and that if the split is $k+(n-k)$ then the running time is $T(k) + T(n-k) + n$.
The recursion for the worst-case running time is therefore
$$
T(n) = \begin{cases} \max_{1 \leq k \leq n-1} T(k) + T(n-k) + n & \text{if } n > 1 \\ 1 & \text{if } n = 1. \end{cases}
$$
In order to calculate the worst-case complexity, we need both an upper bound and a lower bound. You already gave the lower bound:
$$
T(n) \geq T(1) + T(n-1) + n = T(n-1) + n+1.
$$
Iterating this easily yields $T(n) = \Omega(n^2)$.
In the other direction, let us prove that $T(n) \leq n^2$ for all $n$. The base case is clear. For the inductive case,
$$
T(k) + T(n-k) + n \leq k^2 + (n-k)^2 + n \leq 1 + (n-1)^2 + n = n^2 - n + 2 \leq n^2,
$$
since $n \geq 2$. It follows that $T(n) = \Theta(n^2)$.
(In fact, with some more work you should be able to show that $k=1$ is indeed (together with $k=n-1$) the worst choice.)
Let us now proceed to the average case running time (i.e., the expected running time), which satisfies the recurrence
$$
S(n) = \begin{cases} \frac{1}{n-1}\sum_{1 \leq k \leq n-1} [S(k) + S(n-k)] + n & \text{if } n > 1 \\ 1 & \text{if } n = 1. \end{cases}
$$
We can simplify the inductive case to
$$
S(n) = \frac{2}{n-1} \sum_{1 \leq k \leq n-1} S(k) + n.
$$
This implies that for $n \geq 2$,
$$
n S(n+1) = 2(S(1) + \cdots + S(n)) + (n+1)n, \\
(n-1) S(n) = 2(S(1) + \cdots + S(n-1)) + n(n-1).
$$
Subtracting gives
$$
nS(n+1) - (n-1)S(n) = 2S(n) + 2n \Longrightarrow nS(n+1) = (n+1)S(n) + 2n \Longrightarrow \\ \frac{S(n+1)}{n+1} = \frac{S(n)}{n} + \frac{2}{n+1}.
$$
This shows that
$$
\frac{S(n)}{n} = \frac{2}{n} + \frac{2}{n-1} + \cdots + \frac{2}{2} + \frac{S(1)}{1} = 2H_n - 1 = \Theta(\log n),
$$
and so $S(n) = \Theta(n\log n)$.
The recursion for the worst-case running time is therefore
$$
T(n) = \begin{cases} \max_{1 \leq k \leq n-1} T(k) + T(n-k) + n & \text{if } n > 1 \\ 1 & \text{if } n = 1. \end{cases}
$$
In order to calculate the worst-case complexity, we need both an upper bound and a lower bound. You already gave the lower bound:
$$
T(n) \geq T(1) + T(n-1) + n = T(n-1) + n+1.
$$
Iterating this easily yields $T(n) = \Omega(n^2)$.
In the other direction, let us prove that $T(n) \leq n^2$ for all $n$. The base case is clear. For the inductive case,
$$
T(k) + T(n-k) + n \leq k^2 + (n-k)^2 + n \leq 1 + (n-1)^2 + n = n^2 - n + 2 \leq n^2,
$$
since $n \geq 2$. It follows that $T(n) = \Theta(n^2)$.
(In fact, with some more work you should be able to show that $k=1$ is indeed (together with $k=n-1$) the worst choice.)
Let us now proceed to the average case running time (i.e., the expected running time), which satisfies the recurrence
$$
S(n) = \begin{cases} \frac{1}{n-1}\sum_{1 \leq k \leq n-1} [S(k) + S(n-k)] + n & \text{if } n > 1 \\ 1 & \text{if } n = 1. \end{cases}
$$
We can simplify the inductive case to
$$
S(n) = \frac{2}{n-1} \sum_{1 \leq k \leq n-1} S(k) + n.
$$
This implies that for $n \geq 2$,
$$
n S(n+1) = 2(S(1) + \cdots + S(n)) + (n+1)n, \\
(n-1) S(n) = 2(S(1) + \cdots + S(n-1)) + n(n-1).
$$
Subtracting gives
$$
nS(n+1) - (n-1)S(n) = 2S(n) + 2n \Longrightarrow nS(n+1) = (n+1)S(n) + 2n \Longrightarrow \\ \frac{S(n+1)}{n+1} = \frac{S(n)}{n} + \frac{2}{n+1}.
$$
This shows that
$$
\frac{S(n)}{n} = \frac{2}{n} + \frac{2}{n-1} + \cdots + \frac{2}{2} + \frac{S(1)}{1} = 2H_n - 1 = \Theta(\log n),
$$
and so $S(n) = \Theta(n\log n)$.
Context
StackExchange Computer Science Q#108456, answer score: 5
Revisions (0)
No revisions yet.