patternMinor
Linear sorting algorithm
Viewed 0 times
sortinglinearalgorithm
Problem
I'm trying to understand sorting algorithms. And I'm thinking about a totally different way to tackle sorting: Use array indexes to sort a given set of integers.
Suppose We want to sort a list of positive integers (for now). Let's call it
Here is the pseudo-code if the algorithm is not clear:
First, Does every body agree that time complexity is O(N)?
Second, Am I missing something about this algorithm (As I don't think I'm the first one thinking about it)?
Suppose We want to sort a list of positive integers (for now). Let's call it
toSortArray- Create a boolean array that has a length equal to maximum integer in
toSortArray+ 1. Lets call itboolArray. All items in the array arefalseinitially.
- Loop through
toSortArrayand for each integer item, set the boolean at index itemtrue:boolArraty[toSortArray[i]] = true;
- Loop through
boolArrayand ifboolArray[i]istrue, set the first element oftoSortArraytoi
Here is the pseudo-code if the algorithm is not clear:
var boolArray = new bool[toSortArray.Max() + 1];
for (int i = 0; i < toSortArray.Length; i++)
{
boolArray[toSortArray[i]] = true;
}
var index = 0;
for (int i = 0; i < boolArray.Length; i++)
{
if (boolArray[i])
{
toSortArray[index] = i;
index++;
}
}First, Does every body agree that time complexity is O(N)?
Second, Am I missing something about this algorithm (As I don't think I'm the first one thinking about it)?
Solution
This is algorithm called bucket sort, integer sort, counting sort, or Pigeonhole sort (naming is a bit inconsistent across sources and/or different names refer to slight variations of the same idea).
Its time complexity is $O(m)$ (assuming distinct values) where $m$ is the largest among the $n$ integers you are trying to sort. Your algorithm takes $O(n)$ time as long as $m=O(n)$, for example if you know you that you want to sort a large-enough subset of the integers from $1$ to $n$.
If the integers are not distinct, you can replace booleans with counters to obtain an algorithm with complexity $O(m + n)$. In general, you can sort arbitrary items with integer keys by keeping an array of lists, where the $i$-th list will contain all items with key $i$. The output is just concatenation of these lists (and hence this sorting algorithm is stable as long as you append new elements at the end of the lists).
If you apply the above (stable) algorithm multiple times on your input, where the $j$-th execution sorts the integers w.r.t. the $j$-th least significant digit you obtain Radix Sort.
For example, in base $10$, you'd need $\lceil \log_{10} m \rceil$ iterations, thus obtaining an algorithm with time complexity $O( n \log m )$, which is in $O(n \log n)$ even when $m$ is exponentially larger in $n$ (on a unit-cost RAM machine). For a general base $b$, the time needed would be $O((b+n) \log_b m) = O((b+n) \frac{\log m}{\log b})$, which is optimized (in an asymptotic sense) by choosing $b = \Theta(n)$, resulting in a time complexity of $O(n \frac{\log m}{\log n})$.
Its time complexity is $O(m)$ (assuming distinct values) where $m$ is the largest among the $n$ integers you are trying to sort. Your algorithm takes $O(n)$ time as long as $m=O(n)$, for example if you know you that you want to sort a large-enough subset of the integers from $1$ to $n$.
If the integers are not distinct, you can replace booleans with counters to obtain an algorithm with complexity $O(m + n)$. In general, you can sort arbitrary items with integer keys by keeping an array of lists, where the $i$-th list will contain all items with key $i$. The output is just concatenation of these lists (and hence this sorting algorithm is stable as long as you append new elements at the end of the lists).
If you apply the above (stable) algorithm multiple times on your input, where the $j$-th execution sorts the integers w.r.t. the $j$-th least significant digit you obtain Radix Sort.
For example, in base $10$, you'd need $\lceil \log_{10} m \rceil$ iterations, thus obtaining an algorithm with time complexity $O( n \log m )$, which is in $O(n \log n)$ even when $m$ is exponentially larger in $n$ (on a unit-cost RAM machine). For a general base $b$, the time needed would be $O((b+n) \log_b m) = O((b+n) \frac{\log m}{\log b})$, which is optimized (in an asymptotic sense) by choosing $b = \Theta(n)$, resulting in a time complexity of $O(n \frac{\log m}{\log n})$.
Context
StackExchange Computer Science Q#116451, answer score: 3
Revisions (0)
No revisions yet.