patterncppMinor

QuickMergesort — The power of internal buffering

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

thepowerbufferinginternalquickmergesort

Problem

Amongst the lesser-known unstable sorting algorithms is the QuickXsort family of algorithms, as described in QuickXsort: Efficient Sorting with n log n - 1.399n +o(n) Comparisons on Average by Stefan Edelkamp and Armin Weiß. This family of sorting algorithms tries to exploit the technique of internal buffering that we will describe below before presenting our algorithm.
Internal buffering & QuickXsort

Internal buffering is a technique which allows to use algorithms that generally need additional memory without actually having to use any additional memory: when applying an algorithm to a sequence of elements, one can decide that the algorithm will first operate on a part of the sequence, and use the other part as a « buffer » with which it will swap elements instead of storing them in additional memory and retrieving them later. Then it can operate on the remaining part of the sequence until completion. Algorithms such as block sort use internal buffering to avoid allocating extra memory.

QuickXsort is a family of algorithms designed to take advantage of quicksort and internal buffering as follows:

Choose a pivot and partition the original sequence, leaving two partitions A and B.

If possible, apply another sorting algorithm that generally requires additional memory (such as mergesort) on the biggest partition, and use the second partition as an internal buffer: considering that A is the biggest partition, it can swap elements with B instead of allocating temporary memory to store the values and retrieve them back in A.

Apply the technique again on the remaining partition until the original sequence is sorted.

Quicksort partitions are an excellent choice here: once a sequence is partitioned, sorting both halves yields a sorted sequence, so any sorting algorithm can be used, and using one of the unsorted partitions as an internal buffer isn't a problem since it isn't supposed to be sorted yet.
QuickMergesort

QuickMergesort applies a partition just like qu

Solution

I realized that at least two things could be simplified since asking the question.

half_inplace_merge takes only one kind of iterators

The function half_inplace_merge was borrowed from libc++'s implementation of std::inplace_merge. However, despite its name, std::inplace_merge might allocate additional memory to perform an out-of-place merge when possible, so when half_inplace_merge is called, it's sometimes with different kinds of iterators.

However, in our case, the merge is always internal, and swaps elements in the same collection, so the iterators are always the same. We can thus simplify the signature of half_inplace_merge as follows:

template
>
auto half_inplace_merge(InputIterator first1, InputIterator last1,
                        InputIterator first2, InputIterator last2,
                        OutputIterator result, Compare compare={})
    -> void
{
    // ...
}

It doesn't seem to incur any additional performance cost or benefit, but it makes the code simpler and cleaner.

The left partition is always smaller

In libc++'s implementation of std::inplace_merge, the left and right partitions to merge in a sequence could have any size, so the check to know which of the partitions was smaller was necessary to always make the best of the algorithm.

However, in our case, the sequence to mergesort is always split in two parts, the first one always being the smaller one. Moreover, an additional trick is used to shrink the left partition, making it even smaller, without having any effect on the size of the right partition. In the end, when buffered_inplace_merge is called, the left partition is always smaller than the right one, and makes the check useless. The function can be simplified as follows:

template
>
auto buffered_inplace_merge(ForwardIterator first, ForwardIterator middle,
                            ForwardIterator last, ForwardIterator buffer,
                            Compare compare={})
    -> void
{
    auto buffer_end = std::swap_ranges(first, middle, buffer);
    half_inplace_merge(buffer, buffer_end,
                       middle, last,
                       first, compare);
}

Note that dropping the reverse iterators lowered the iterator requirements of the function to forward iterators, even though the overall quick_merge_sort still requires random-access iterators because of std::nth_element.

Dropping std::not_fn also means that the code doesn't use any C++17 feature anymore and works out-of-the-box with a C++14 compiler. Sure, it could be made to work with even C++03, but that's not the point :)

Avoid a gratuitous albeit small pessimization when sorting an std::deque

The following line will likely perform a few extra operations if we try to sort an std::deque with quickmergesort:

auto pivot = first + 2 * (size / 3) - 2;

Since operator+ is left-associative, the statement above is equivalent to the following one:

auto pivot = (first + 2 * (size / 3)) - 2;

When first is an std::deque iterator, it is first incremented by first + 2 (size / 3), then decremented by 2. Considering that operator+ and operator- are not trivial for std::deque iterators, computing 2 (size / 3) - 2 then incrementing first is likely more efficient. We can thus avoid this small albeit gratuitous pessimization by transforming the statement as follows:

auto pivot = first + (2 * (size / 3) - 2);

In this specific algorithm, it should not make a noticeable difference, but some algorithms can become dramatically faster with std::deque (and other random-access containers with not-trivial logic) when making sure that we are avoiding such pessimizations.

Code Snippets

template<
    typename InputIterator,
    typename OutputIterator,
    typename Compare = std::less<>
>
auto half_inplace_merge(InputIterator first1, InputIterator last1,
                        InputIterator first2, InputIterator last2,
                        OutputIterator result, Compare compare={})
    -> void
{
    // ...
}

template<
    typename ForwardIterator,
    typename Compare = std::less<>
>
auto buffered_inplace_merge(ForwardIterator first, ForwardIterator middle,
                            ForwardIterator last, ForwardIterator buffer,
                            Compare compare={})
    -> void
{
    auto buffer_end = std::swap_ranges(first, middle, buffer);
    half_inplace_merge(buffer, buffer_end,
                       middle, last,
                       first, compare);
}

auto pivot = first + 2 * (size / 3) - 2;

auto pivot = (first + 2 * (size / 3)) - 2;

auto pivot = first + (2 * (size / 3) - 2);

Context

StackExchange Code Review Q#149443, answer score: 4

Revisions (0)

No revisions yet.