HiveBrain v1.2.0
Get Started
← Back to all entries
snippetcppMinor

Optimizing insertion sort for small amounts of data

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
insertionsmalloptimizingforamountssortdata

Problem

I'm building a hybrid sort, and for that I need a fast and adaptive sort ideal for small sizes (< 65 elements).

Insertion sort immediately comes to mind, and I've been tinkering with different implementations of it. My requirement is that it takes in iterators as per C++ standard.

Linear insertion sort

template 
void lin_sort(Iter begin, Iter end) {
    for (auto cur = begin; cur != end; ++cur) {
        auto key = *cur;
        auto ins = cur - 1;
        for (; begin <= ins && key < *ins; --ins)
            *(ins + 1) = *ins;    // move 1 to right until correct position's found
        *(ins + 1) = key;
    }
}


Binary insertion sort

template 
void ins_sort(Iter begin, Iter end) {
    for (auto cur = begin; cur != end; ++cur) {
        // upper_bound is binary search for correct insertion position
        // shift everything from up to cur to the right one and insert key into right place
        std::rotate(std::upper_bound(begin, cur, *cur), cur, cur + 1);
    }
}


Alternative and equivalent binary insertion sort

template 
void ins_sort(Iter begin, Iter end) {
    for (auto cur = begin; cur != end; ++cur) {
        auto key = *cur;
        auto ins = std::upper_bound(begin, cur, key);
        for (auto shift = cur; shift != ins; --shift) *shift = *(shift-1); 
        *ins = key;
    }
}


Some benchmarking reveals the binary version to be around 1.5 times slower than the linear version, compiling on GCC 4.8.2 with -O3

mylibs\algo>algotest s l
after: 100000 lists with 64 elements: 1.11421e+06 us    1114.21 ms  (linear insertion sort)
mylibs\algo>algotest s i
after: 100000 lists with 64 elements: 1.62741e+06 us    1627.41 ms  (binary insertion sort)


I am only concerned about its performance for under 65 elements, so the asymptotic guarantee given by binary search is irrelevant. Can anyone suggest improvements to the linear insertion sort, or a better alternative for what I'm looking for?

Benchmarking code: (full testing can be

Solution

Unfortunately, you are unlikely to find a better sort than the linear insertion sort when it comes to sorting small data sets. The only sorting algorithms that might be faster are sorting algorithms specialized for a given size of input: for example, optimized sorting networks can be really fast if you need to sort a small amount of integers, but are generally slower than the insertion sort if not implemented properly or if used with other types. The following image is a graph that shows the performance of different kinds of sorting algorithms to sort small fixed-size integer arrays (the time in ms corresponds to the time needed to sort one million arrays of a given size). Distributions had a random pattern:

Some of the algorithms were specialized for fixed-size collection; most notably low_comparisons_sorter, low_moves_sorter and sorting_network_sorters. The linear insertion sort was the fastest sorting algorithm which worked for collections of any size; my best bet is that a small collection fits in the fastest cache and thus benefits from the linear access.

Now, the linear version of the insertion sort I use is slightly different: it has a check to make sure that no temporary value is created if the current element is already in place. In your algorithm, it would look like that:

template 
void lin_sort(Iter begin, Iter end) {
    for (auto cur = begin; cur != end; ++cur) {
        auto ins = cur - 1;
        if (*cur < *ins) {
            auto key = *cur;
            do {
                *(ins + 1) = *ins;  // move 1 to right until correct position's found
                --ins;
            } while (begin <= ins && key < *ins);
            *(ins + 1) = key;
        }
    }
}


I doubt it would change a thing, but you could also try to find the point where to insert the value in a linear fashion, then rotate the values. It may be simpler for a compiler to vectorize code that linearly compares things then linearly copies things rather than a code that mixes comparison and copy operations on-the-fly. I did not test that, so don't take it for granted.

To sum up: for small collection, it's unlikely that you will beat linear insertion sort unless you have specializations for some specific types or some specific sizes, which means more code.

Code Snippets

template <typename Iter>
void lin_sort(Iter begin, Iter end) {
    for (auto cur = begin; cur != end; ++cur) {
        auto ins = cur - 1;
        if (*cur < *ins) {
            auto key = *cur;
            do {
                *(ins + 1) = *ins;  // move 1 to right until correct position's found
                --ins;
            } while (begin <= ins && key < *ins);
            *(ins + 1) = key;
        }
    }
}

Context

StackExchange Code Review Q#71043, answer score: 4

Revisions (0)

No revisions yet.