HiveBrain v1.2.0
Get Started
← Back to all entries
snippetjavaMinor

Counting sort for objects in Java

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
countingobjectsjavaforsort

Problem

The counting sort for integers counts the amount of each integer, then traverses the count entries in order, and for each entry inserts the integer representing the bucket into the array as many times as its entry denotes.

This is basically same idea, but for general objects. The algorithm makes a pass through the entire range and each array component is appended to a list of equal elements. After the first pass, the array containing distinct components is sorted by a conventional algorithm (here java.util.Arrays.sort). The next step is to simply iterate over each distinct component, and for each component dump the list it represents into the original array.

Whenever the input array consists of n elements and only k of which are distinct, the running time of object-counting sort is \$\mathcal{O}(n + k \log k),\$ and the space complexity is \$\mathcal{O}(n + k)\$.

The algorithm, however, is particularly picky about the API the array component is to provide: besides the compareTo, hashCode and equals are required. equals is true if and only if compareTo returns 0. Also, if equals is true, hashCode should agree (the other way around is not necessarily true: if two objects agree on hashCode, they might not be same from equals-point of view.

It seems that it is superior to java.util.Arrays.sort at arrays of size around 1e6.

So, what do you think?

Arrays.java:

```
package net.coderodde.util;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class Arrays {

public static >
void sort(final E[] array, final int fromIndex, final int toIndex) {
if (toIndex - fromIndex > map = new HashMap<>();

for (int i = fromIndex; i list = new ArrayList<>();
list.add(current);
map.put(current, list);
}
}

final Object[] keys = map.keySet().toArray();
java.util.Arrays.sort(keys);
int index = fromIn

Solution

Counting sort

The definition in wikipedia, emphasis mine:

counting sort is an algorithm for sorting a collection of objects according to keys that are small positive integers; that is, it is an integer sorting algorithm. It operates by counting the number of objects that possess distinct key values, and applying prefix sum on those counts to determine the positions of each key value in the output sequence.

The posted code groups the items to distinct groups,
then sorts the keys, and finally builds the sorted output.
It does not count objects, and it does not use counts to produce the output.

It sorts based on distinct keys.
I see the similarity with counting sort, but it's not really that.
Comparing with Arrays.sort

It seems that it is superior to java.util.Arrays.sort at arrays of size around 1e6.

I think proper benchmarking is hard in Java.
To make such bold statement, I would expect stronger evidence,
for example using jmh.

The measurement in the posted code gives unfair advantage to itself,
by using a comparison on age, with at most 101 distinct values.
That is, the 1e6 random objects will all fall into 101 distinct groups,
and in this case it's not surprising that the posted code seems to perform better.
Tweaking the posted code a bit, the scale seems to tip when the number of distinct objects is passes about one third of the total count.
When all objects are distinct, java.util.Arrays.sort wins hands down.

The statement could become true by adding a relevant detail:

It seems that it is superior to java.util.Arrays.sort at arrays of size around 1e6 when the number of distinct items is relatively small (less than one fourth of the total count).

Simpler aggregation code

A bit simpler way to write the aggregation code:

for (int i = fromIndex; i  list = map.get(current);
    if (list == null) {
        list = new ArrayList<>();
        map.put(current, list);
    }

    list.add(current);
}


Even simpler:

for (int i = fromIndex; i  list = map.computeIfAbsent(current, k -> new ArrayList<>());
    list.add(current);
}


One-liner:

final Map> map = java.util.Arrays.stream(array, fromIndex, toIndex)
        .collect(Collectors.groupingBy(Function.identity()));

Code Snippets

for (int i = fromIndex; i < toIndex; ++i) {
    final E current = array[i];

    List<E> list = map.get(current);
    if (list == null) {
        list = new ArrayList<>();
        map.put(current, list);
    }

    list.add(current);
}
for (int i = fromIndex; i < toIndex; ++i) {
    final E current = array[i];
    List<E> list = map.computeIfAbsent(current, k -> new ArrayList<>());
    list.add(current);
}
final Map<E, List<E>> map = java.util.Arrays.stream(array, fromIndex, toIndex)
        .collect(Collectors.groupingBy(Function.identity()));

Context

StackExchange Code Review Q#74058, answer score: 3

Revisions (0)

No revisions yet.