snippetpythonModerate

Beating built-in timsort with own count sort function

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

builtwithtimsortfunctionownsortcountbeating

Problem

I am trying to write a count sort in Python to beat the built-in timsort in certain situations. Right now it beats the built-in sorted function, but only for very large arrays (1 million integers in length and longer, I haven't tried over 10 million) and only for a range no larger than 10,000. Additionally, the victory is narrow, with count sort only winning by a significant margin in random lists specifically tailored to it.

I have read about astounding performance gains that can be gained from vectorizing Python code, but I don't particularly understand how to do it or how it could be used here. I would like to know how I can vectorize this code to speed it up, and any other performance suggestions are welcome.

def countsort(unsorted_list):
    counts = {}
    for num in unsorted_list:
        if num in counts:
            counts[num] += 1
        else:
            counts[num] = 1

    sorted_list = []
    for num in xrange(min(unsorted_list), max(unsorted_list) + 1):
        if num in counts:
            for j in xrange(counts[num]):
                sorted_list.append(num)

    return sorted_list

GitHub

Additional info:

All that counts is raw speed here, so sacrificing even more space for speed gains is completely fair game.

I realize the code is fairly short and clear already, so I don't know how much room there is for improvement in speed.

If anyone has a change to the code to make it shorter, as long as it doesn't make it slower, that would be awesome as well.

After doing some more precise timing, it is clear that the first for loop takes about 2/3 of the execution time, with the second, constructor, loop taking just 1/3 of the time.

Solution

I did some benchmarks using the timeit module and this test data:

random.seed(1)
data = [random.randint(0,10000) for _ in xrange(1000000)]

Original version clocks in at 411 ms, and the builtin sorted at 512 ms.

Using counts = defaultdict(int) that allows unconditional counts[num] += 1 takes it to 330 ms.

Using sorted_list.extend(counts[num] * [num]) instead of the inner loop improves to 250 ms, or 246 ms when also omitting the second if.

Using min(counts), max(counts) instead of min(unsorted_list), max(unsorted_list) improves further to 197 ms.

Using chain and repeat from itertools to construct the result takes 182 ms (though repeat does not make much difference).

Code looks like this after the changes:

from collections import defaultdict
from itertools import chain, repeat

def countsort(unsorted_list):
    counts = defaultdict(int)
    for num in unsorted_list:
        counts[num] += 1

    return list(
            chain.from_iterable(
                repeat(num, counts[num])
                for num in xrange(min(counts), max(counts) + 1)))

Code Snippets

random.seed(1)
data = [random.randint(0,10000) for _ in xrange(1000000)]

from collections import defaultdict
from itertools import chain, repeat

def countsort(unsorted_list):
    counts = defaultdict(int)
    for num in unsorted_list:
        counts[num] += 1

    return list(
            chain.from_iterable(
                repeat(num, counts[num])
                for num in xrange(min(counts), max(counts) + 1)))

Context

StackExchange Code Review Q#30408, answer score: 15

Revisions (0)

No revisions yet.