patternjavaMajor
Sorting millions of integers
Viewed 0 times
sortingintegersmillions
Problem
Last Friday I was hit with a sorting interview question that I never really had to deal with.
Develop a your own sorting algorithm.
For instance:
Would be
I spent the night trying to come up with my own method to do this. And this is what I came up with.
Steps:
-
The first loop has the range of numbers. For instance, if the
-
The second loop works backwards from the max position in the
-
The 3rd loop loops back through the main array grabbing the same index in the range array and plugging the number in the correct position in the
It handles duplicates and with a little tinkering you can make it sort many other things. It uses more memory than I wanted in order to do sorting but wow its lightening fast. I never thought of looking into how these sorting algorithms work until I did th
Develop a your own sorting algorithm.
- It cannot use any other Classes for help.
- It needs to be able to sort an array of millions of integers in size.
- It needs to be as fast as possible.
For instance:
int[] old = {5434, 3454, 2, 0, 356, 896, 7324, 888, 99, 78365, 111};
int highestNumber = 78365;Would be
int[] new = {0, 2, 99, 111, 356, 888, 896, 3454, 5434, 7324, 78365};I spent the night trying to come up with my own method to do this. And this is what I came up with.
public class Main {
public static void main(String[] args) {
int[] twentyMillion = new int [20000000];
for (int i = 0; i = 0; i--) {
range[i] = past - range[i];
past = range[i];
}
for (int i = 0; i < twentyMillion.length; i++) {
newArray[range[rangePosition[i]]] = twentyMillion[i];
range[rangePosition[i]]++;
}
System.out.println("time = " + (System.nanoTime() - time));
}
}Steps:
-
The first loop has the range of numbers. For instance, if the
rangeArray goes from 0 to 3,000,000, it increments every case of each number it finds in that array. So every time it finds 2,750,000 it increments that position in the rangeArray.-
The second loop works backwards from the max position in the
rangeArray. So if the size is 3,000,000 and it has 100,000 cases of 3,000,000 it says that 3,000,000 will start at 2,900,000 and go to the max.-
The 3rd loop loops back through the main array grabbing the same index in the range array and plugging the number in the correct position in the
newArray.It handles duplicates and with a little tinkering you can make it sort many other things. It uses more memory than I wanted in order to do sorting but wow its lightening fast. I never thought of looking into how these sorting algorithms work until I did th
Solution
The algorithm you have implemented is known as counting sort. Its run-time cost is linear in the size of the input – faster than any comparison-based sorting algorithm can possibly get. (At the cost of being also linear in the difference of the maximum and minimum element in the input.) Congratulations if you've come up with this idea on your own. Since they already give you the largest number in the array as additional input, it seems very likely that they wanted to see this algorithm. (Of course, you can find the maximum yourself in linear time, if needed.)
Remarks about your code:
Remarks about your code:
- The
rangePositionarray is initialized with an exact copy oftwentyMillionand then only ever read. Why did you create it instead of usingtwentyMilliondirectly?
- If
twentyMillioncontains a negative number, your implementation will explode. Maybe you simply forgot to mention that all the inputs are guaranteed to be non-negative? Otherwise, you'd also need to know the minimum value and normalize your keys to that. (This could also help you save something if the minimum is much larger than zero.)
- If the
highestNumberis extremely large, you will get a problem. For example, you will probably not be able to allocate anew int[Integer.MAX_VALUE]without receiving anOutOfMemoryError. (And if you allow for negative numbers in the input, you might even need an array larger thanInteger.MAX_VALUE!) And even if you could allocate it, iterating over it will take forever. If you want to make your code more robust, you could decide by some heuristic whether the combination oftwentyMillion.lengthandhighestNumberwarrants the overhead of counting sort or you'd be better off using a comparison-based O(n log(n)) fallback-algorithm.
twentyMillionis a poor name for a variable that does not necessarily name an array of length 20M.
Context
StackExchange Code Review Q#91087, answer score: 29
Revisions (0)
No revisions yet.