snippetpythonModerate
Beating built-in timsort with own count sort function
Viewed 0 times
builtwithtimsortfunctionownsortcountbeating
Problem
I am trying to write a count sort in Python to beat the built-in
I have read about astounding performance gains that can be gained from vectorizing Python code, but I don't particularly understand how to do it or how it could be used here. I would like to know how I can vectorize this code to speed it up, and any other performance suggestions are welcome.
GitHub
Additional info:
timsort in certain situations. Right now it beats the built-in sorted function, but only for very large arrays (1 million integers in length and longer, I haven't tried over 10 million) and only for a range no larger than 10,000. Additionally, the victory is narrow, with count sort only winning by a significant margin in random lists specifically tailored to it.I have read about astounding performance gains that can be gained from vectorizing Python code, but I don't particularly understand how to do it or how it could be used here. I would like to know how I can vectorize this code to speed it up, and any other performance suggestions are welcome.
def countsort(unsorted_list):
counts = {}
for num in unsorted_list:
if num in counts:
counts[num] += 1
else:
counts[num] = 1
sorted_list = []
for num in xrange(min(unsorted_list), max(unsorted_list) + 1):
if num in counts:
for j in xrange(counts[num]):
sorted_list.append(num)
return sorted_listGitHub
Additional info:
- All that counts is raw speed here, so sacrificing even more space for speed gains is completely fair game.
- I realize the code is fairly short and clear already, so I don't know how much room there is for improvement in speed.
- If anyone has a change to the code to make it shorter, as long as it doesn't make it slower, that would be awesome as well.
- After doing some more precise timing, it is clear that the first for loop takes about 2/3 of the execution time, with the second, constructor, loop taking just 1/3 of the time.
Solution
I did some benchmarks using the
Original version clocks in at 411 ms, and the builtin
Using
Using
Using
Using
Code looks like this after the changes:
timeit module and this test data:random.seed(1)
data = [random.randint(0,10000) for _ in xrange(1000000)]Original version clocks in at 411 ms, and the builtin
sorted at 512 ms.Using
counts = defaultdict(int) that allows unconditional counts[num] += 1 takes it to 330 ms.Using
sorted_list.extend(counts[num] * [num]) instead of the inner loop improves to 250 ms, or 246 ms when also omitting the second if.Using
min(counts), max(counts) instead of min(unsorted_list), max(unsorted_list) improves further to 197 ms.Using
chain and repeat from itertools to construct the result takes 182 ms (though repeat does not make much difference).Code looks like this after the changes:
from collections import defaultdict
from itertools import chain, repeat
def countsort(unsorted_list):
counts = defaultdict(int)
for num in unsorted_list:
counts[num] += 1
return list(
chain.from_iterable(
repeat(num, counts[num])
for num in xrange(min(counts), max(counts) + 1)))Code Snippets
random.seed(1)
data = [random.randint(0,10000) for _ in xrange(1000000)]from collections import defaultdict
from itertools import chain, repeat
def countsort(unsorted_list):
counts = defaultdict(int)
for num in unsorted_list:
counts[num] += 1
return list(
chain.from_iterable(
repeat(num, counts[num])
for num in xrange(min(counts), max(counts) + 1)))Context
StackExchange Code Review Q#30408, answer score: 15
Revisions (0)
No revisions yet.