snippetjavaMinor
Counting sort for objects in Java
Viewed 0 times
countingobjectsjavaforsort
Problem
The counting sort for integers counts the amount of each integer, then traverses the count entries in order, and for each entry inserts the integer representing the bucket into the array as many times as its entry denotes.
This is basically same idea, but for general objects. The algorithm makes a pass through the entire range and each array component is appended to a list of equal elements. After the first pass, the array containing distinct components is sorted by a conventional algorithm (here
Whenever the input array consists of n elements and only k of which are distinct, the running time of object-counting sort is \$\mathcal{O}(n + k \log k),\$ and the space complexity is \$\mathcal{O}(n + k)\$.
The algorithm, however, is particularly picky about the API the array component is to provide: besides the
It seems that it is superior to
So, what do you think?
Arrays.java:
```
package net.coderodde.util;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class Arrays {
public static >
void sort(final E[] array, final int fromIndex, final int toIndex) {
if (toIndex - fromIndex > map = new HashMap<>();
for (int i = fromIndex; i list = new ArrayList<>();
list.add(current);
map.put(current, list);
}
}
final Object[] keys = map.keySet().toArray();
java.util.Arrays.sort(keys);
int index = fromIn
This is basically same idea, but for general objects. The algorithm makes a pass through the entire range and each array component is appended to a list of equal elements. After the first pass, the array containing distinct components is sorted by a conventional algorithm (here
java.util.Arrays.sort). The next step is to simply iterate over each distinct component, and for each component dump the list it represents into the original array.Whenever the input array consists of n elements and only k of which are distinct, the running time of object-counting sort is \$\mathcal{O}(n + k \log k),\$ and the space complexity is \$\mathcal{O}(n + k)\$.
The algorithm, however, is particularly picky about the API the array component is to provide: besides the
compareTo, hashCode and equals are required. equals is true if and only if compareTo returns 0. Also, if equals is true, hashCode should agree (the other way around is not necessarily true: if two objects agree on hashCode, they might not be same from equals-point of view.It seems that it is superior to
java.util.Arrays.sort at arrays of size around 1e6.So, what do you think?
Arrays.java:
```
package net.coderodde.util;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class Arrays {
public static >
void sort(final E[] array, final int fromIndex, final int toIndex) {
if (toIndex - fromIndex > map = new HashMap<>();
for (int i = fromIndex; i list = new ArrayList<>();
list.add(current);
map.put(current, list);
}
}
final Object[] keys = map.keySet().toArray();
java.util.Arrays.sort(keys);
int index = fromIn
Solution
Counting sort
The definition in wikipedia, emphasis mine:
counting sort is an algorithm for sorting a collection of objects according to keys that are small positive integers; that is, it is an integer sorting algorithm. It operates by counting the number of objects that possess distinct key values, and applying prefix sum on those counts to determine the positions of each key value in the output sequence.
The posted code groups the items to distinct groups,
then sorts the keys, and finally builds the sorted output.
It does not count objects, and it does not use counts to produce the output.
It sorts based on distinct keys.
I see the similarity with counting sort, but it's not really that.
Comparing with
It seems that it is superior to java.util.Arrays.sort at arrays of size around 1e6.
I think proper benchmarking is hard in Java.
To make such bold statement, I would expect stronger evidence,
for example using jmh.
The measurement in the posted code gives unfair advantage to itself,
by using a comparison on age, with at most 101 distinct values.
That is, the 1e6 random objects will all fall into 101 distinct groups,
and in this case it's not surprising that the posted code seems to perform better.
Tweaking the posted code a bit, the scale seems to tip when the number of distinct objects is passes about one third of the total count.
When all objects are distinct,
The statement could become true by adding a relevant detail:
It seems that it is superior to java.util.Arrays.sort at arrays of size around 1e6 when the number of distinct items is relatively small (less than one fourth of the total count).
Simpler aggregation code
A bit simpler way to write the aggregation code:
Even simpler:
One-liner:
The definition in wikipedia, emphasis mine:
counting sort is an algorithm for sorting a collection of objects according to keys that are small positive integers; that is, it is an integer sorting algorithm. It operates by counting the number of objects that possess distinct key values, and applying prefix sum on those counts to determine the positions of each key value in the output sequence.
The posted code groups the items to distinct groups,
then sorts the keys, and finally builds the sorted output.
It does not count objects, and it does not use counts to produce the output.
It sorts based on distinct keys.
I see the similarity with counting sort, but it's not really that.
Comparing with
Arrays.sortIt seems that it is superior to java.util.Arrays.sort at arrays of size around 1e6.
I think proper benchmarking is hard in Java.
To make such bold statement, I would expect stronger evidence,
for example using jmh.
The measurement in the posted code gives unfair advantage to itself,
by using a comparison on age, with at most 101 distinct values.
That is, the 1e6 random objects will all fall into 101 distinct groups,
and in this case it's not surprising that the posted code seems to perform better.
Tweaking the posted code a bit, the scale seems to tip when the number of distinct objects is passes about one third of the total count.
When all objects are distinct,
java.util.Arrays.sort wins hands down.The statement could become true by adding a relevant detail:
It seems that it is superior to java.util.Arrays.sort at arrays of size around 1e6 when the number of distinct items is relatively small (less than one fourth of the total count).
Simpler aggregation code
A bit simpler way to write the aggregation code:
for (int i = fromIndex; i list = map.get(current);
if (list == null) {
list = new ArrayList<>();
map.put(current, list);
}
list.add(current);
}Even simpler:
for (int i = fromIndex; i list = map.computeIfAbsent(current, k -> new ArrayList<>());
list.add(current);
}One-liner:
final Map> map = java.util.Arrays.stream(array, fromIndex, toIndex)
.collect(Collectors.groupingBy(Function.identity()));Code Snippets
for (int i = fromIndex; i < toIndex; ++i) {
final E current = array[i];
List<E> list = map.get(current);
if (list == null) {
list = new ArrayList<>();
map.put(current, list);
}
list.add(current);
}for (int i = fromIndex; i < toIndex; ++i) {
final E current = array[i];
List<E> list = map.computeIfAbsent(current, k -> new ArrayList<>());
list.add(current);
}final Map<E, List<E>> map = java.util.Arrays.stream(array, fromIndex, toIndex)
.collect(Collectors.groupingBy(Function.identity()));Context
StackExchange Code Review Q#74058, answer score: 3
Revisions (0)
No revisions yet.