snippetMinor

Sorting networks hybrid sort

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

sortinghybridnetworkssort

Problem

I saw an answer about the fastest way to sort an array with 6 elements. Turns out that using a sorting network is the best choice. If sorting networks are the fastest way to sort a small(< 16 elements) fixed size array, would it make sense to use them in quicksort or mergesort when the recursive call is for an array with less than the given threshold?

Solution

The article Applying Sorting Networks to Synthesize Optimized Sorting Libraries by Codish et al. experiments in this area, trying to take advantage of "instruction level parallelism and non-branching conditional assignment instructions, both of which are common in modern CPU architectures", and create a quicksort implementation that falls back on sorting networks instead of falling back on insertion sort. In their tests, the result was faster than with the insertion sort fallback. That said, the paper only analyses how it performs with int, and has a dedicated "copy-and-swap" algorithm for int to order two integers without branches.

The hybrid sorting algorithm RADULS2 described in Even faster sorting of (not only) integers by Kokot et al. seems to use sorting networks too. The following note is rather interesting:

Yet it is worth mentioning, that our implementation of sorting network for 16B records uses SSE to allow elements swapping by branchless code. Without such optimization [sorting networks] performed much worse as compilers tend to produce the code with branching instructions. The cost of swapping the elements is easy observable in case of 32B records—in such a situation [enumeration sort] clearly outperforms [sorting networks].

Access Aligned Sort uses bitonic sorting networks to improve a bottom-up mergesort and ends up with an algorithm that is faster that standard implementations of std::sort when sorting properly aligned float arrays. The algorithm uses [Boost.SIMD] to take advantage of SIMD instructions.

Finally, I've also experimented with sorting networks in a library of mine: I managed to obtain sorting algorithms which were faster than the usual insertion sort for small arrays of int, but I had to use the "copy-and-swap" trick described in Applying Sorting Networks to Synthesize Optimized Sorting Libraries to make them efficient, and I realized that it didn't really improve things for long or long long integers. The number of comparison tends to make the algorithm too slow compared to a simple insertion sort, which is adaptive. I tried to use them as a fallback algorithm for quicksort instead of insertion sort, but it didn't make an observable difference.

GPU sorting networks and hardware sorting networks would probably outperform most sorting algorithms, but I don't know much about those. Some guy implemented a few sorting networks with CUDA, you might want to take a look.

Conclusion: if you only need to sort small arrays, sorting networks are worth it, but they generally require implementers to carefully implement them if they want decent speed, and most of the time, unless you're sorting something else than raw collections of integers or floating point numbers, they aren't faster than usual sorting algorithms. When you use them as a fallback, it doesn't even always make a speed difference, so unless you really need that speed, they're not worth the implementation cost.

Context

StackExchange Computer Science Q#79760, answer score: 2

Revisions (0)

No revisions yet.