snippetMinor
Why is finding minimum number of comparisons to sort $n$ elements so difficult?
Viewed 0 times
whynumberelementsminimumfindingsortdifficultcomparisons
Problem
In The Art of Computer Programming 2nd Ed, Vol 3, Section 5.3.1 then discuss a function $S(n)$ which is define as:
$S(n)$ : The minimum number of comparisons that suffice to sort $n$ elements.
Further, the book regards $\lceil \lg n! \rceil$ as the information theoretic lower bound for $S(n)$.
Using merge insertion they also upper bound the number of comparisons by $F(n)$ where
$$F(n) = \sum_{k = 1}^{n} \lceil \lg \tfrac{3}{4} k \rceil$$
So you can get the bound $\lceil \lg n! \rceil \leq S(n) \leq F(n)$, and for any values $n$ where $\lceil \lg n! \rceil = F(n)$ you can find the exact value of $S(n)$.
My questions are:
-
Why does $S(n)$ not always match the information theoretic lower bound $\lceil \lg n! \rceil$? It seems like if this is all the bits of information we should need, that this is all the comparisons we would need. Why do they differ?
-
Why is $S(n)$ so difficult to compute? It's discussed in the book some but the reasons are still unclear to me. Do you have to brute force and create every possible decision tree for a given $n$ and determine the longest path? Is there not a more efficient way? It seems that $S(n)$ has only been exactly computed for $n \leq 22$ (See A036604 here).
$S(n)$ : The minimum number of comparisons that suffice to sort $n$ elements.
Further, the book regards $\lceil \lg n! \rceil$ as the information theoretic lower bound for $S(n)$.
Using merge insertion they also upper bound the number of comparisons by $F(n)$ where
$$F(n) = \sum_{k = 1}^{n} \lceil \lg \tfrac{3}{4} k \rceil$$
So you can get the bound $\lceil \lg n! \rceil \leq S(n) \leq F(n)$, and for any values $n$ where $\lceil \lg n! \rceil = F(n)$ you can find the exact value of $S(n)$.
My questions are:
-
Why does $S(n)$ not always match the information theoretic lower bound $\lceil \lg n! \rceil$? It seems like if this is all the bits of information we should need, that this is all the comparisons we would need. Why do they differ?
-
Why is $S(n)$ so difficult to compute? It's discussed in the book some but the reasons are still unclear to me. Do you have to brute force and create every possible decision tree for a given $n$ and determine the longest path? Is there not a more efficient way? It seems that $S(n)$ has only been exactly computed for $n \leq 22$ (See A036604 here).
Solution
Let's say the size of the array is such that there are $2^{k-1} < x ≤ 2^k$ possible permutations, so the theoretical lower bound to sort this array is k comparisons.
Any comparison divides the set of possible permutations in two subsets, consistent with each outcome of the comparison. You'd first need a comparison that splits the set of all possible permutations into two subsets, each of size at most $2^{k-1}$. And each subset you need to split into two subsets of size at most $2^{k-2}$ etc. If these subsets don't have the same size, then after j comparisons the size of the largest subset could exceed $2^{k-j}$, and you lost.
Any comparison divides the set of possible permutations in two subsets, consistent with each outcome of the comparison. You'd first need a comparison that splits the set of all possible permutations into two subsets, each of size at most $2^{k-1}$. And each subset you need to split into two subsets of size at most $2^{k-2}$ etc. If these subsets don't have the same size, then after j comparisons the size of the largest subset could exceed $2^{k-j}$, and you lost.
Context
StackExchange Computer Science Q#106738, answer score: 3
Revisions (0)
No revisions yet.