patterncppMinor
QuickMergesort — The power of internal buffering
Viewed 0 times
thepowerbufferinginternalquickmergesort
Problem
Amongst the lesser-known unstable sorting algorithms is the QuickXsort family of algorithms, as described in QuickXsort: Efficient Sorting with n log n - 1.399n +o(n) Comparisons on Average by Stefan Edelkamp and Armin Weiß. This family of sorting algorithms tries to exploit the technique of internal buffering that we will describe below before presenting our algorithm.
Internal buffering & QuickXsort
Internal buffering is a technique which allows to use algorithms that generally need additional memory without actually having to use any additional memory: when applying an algorithm to a sequence of elements, one can decide that the algorithm will first operate on a part of the sequence, and use the other part as a « buffer » with which it will swap elements instead of storing them in additional memory and retrieving them later. Then it can operate on the remaining part of the sequence until completion. Algorithms such as block sort use internal buffering to avoid allocating extra memory.
QuickXsort is a family of algorithms designed to take advantage of quicksort and internal buffering as follows:
Quicksort partitions are an excellent choice here: once a sequence is partitioned, sorting both halves yields a sorted sequence, so any sorting algorithm can be used, and using one of the unsorted partitions as an internal buffer isn't a problem since it isn't supposed to be sorted yet.
QuickMergesort
QuickMergesort applies a partition just like qu
Internal buffering & QuickXsort
Internal buffering is a technique which allows to use algorithms that generally need additional memory without actually having to use any additional memory: when applying an algorithm to a sequence of elements, one can decide that the algorithm will first operate on a part of the sequence, and use the other part as a « buffer » with which it will swap elements instead of storing them in additional memory and retrieving them later. Then it can operate on the remaining part of the sequence until completion. Algorithms such as block sort use internal buffering to avoid allocating extra memory.
QuickXsort is a family of algorithms designed to take advantage of quicksort and internal buffering as follows:
- Choose a pivot and partition the original sequence, leaving two partitions A and B.
- If possible, apply another sorting algorithm that generally requires additional memory (such as mergesort) on the biggest partition, and use the second partition as an internal buffer: considering that A is the biggest partition, it can swap elements with B instead of allocating temporary memory to store the values and retrieve them back in A.
- Apply the technique again on the remaining partition until the original sequence is sorted.
Quicksort partitions are an excellent choice here: once a sequence is partitioned, sorting both halves yields a sorted sequence, so any sorting algorithm can be used, and using one of the unsorted partitions as an internal buffer isn't a problem since it isn't supposed to be sorted yet.
QuickMergesort
QuickMergesort applies a partition just like qu
Solution
I realized that at least two things could be simplified since asking the question.
The function
However, in our case, the merge is always internal, and swaps elements in the same collection, so the iterators are always the same. We can thus simplify the signature of
It doesn't seem to incur any additional performance cost or benefit, but it makes the code simpler and cleaner.
The left partition is always smaller
In libc++'s implementation of
However, in our case, the sequence to mergesort is always split in two parts, the first one always being the smaller one. Moreover, an additional trick is used to shrink the left partition, making it even smaller, without having any effect on the size of the right partition. In the end, when
Note that dropping the reverse iterators lowered the iterator requirements of the function to forward iterators, even though the overall
Dropping
Avoid a gratuitous albeit small pessimization when sorting an
The following line will likely perform a few extra operations if we try to sort an
Since
When
In this specific algorithm, it should not make a noticeable difference, but some algorithms can become dramatically faster with
half_inplace_merge takes only one kind of iteratorsThe function
half_inplace_merge was borrowed from libc++'s implementation of std::inplace_merge. However, despite its name, std::inplace_merge might allocate additional memory to perform an out-of-place merge when possible, so when half_inplace_merge is called, it's sometimes with different kinds of iterators.However, in our case, the merge is always internal, and swaps elements in the same collection, so the iterators are always the same. We can thus simplify the signature of
half_inplace_merge as follows:template
>
auto half_inplace_merge(InputIterator first1, InputIterator last1,
InputIterator first2, InputIterator last2,
OutputIterator result, Compare compare={})
-> void
{
// ...
}It doesn't seem to incur any additional performance cost or benefit, but it makes the code simpler and cleaner.
The left partition is always smaller
In libc++'s implementation of
std::inplace_merge, the left and right partitions to merge in a sequence could have any size, so the check to know which of the partitions was smaller was necessary to always make the best of the algorithm.However, in our case, the sequence to mergesort is always split in two parts, the first one always being the smaller one. Moreover, an additional trick is used to shrink the left partition, making it even smaller, without having any effect on the size of the right partition. In the end, when
buffered_inplace_merge is called, the left partition is always smaller than the right one, and makes the check useless. The function can be simplified as follows:template
>
auto buffered_inplace_merge(ForwardIterator first, ForwardIterator middle,
ForwardIterator last, ForwardIterator buffer,
Compare compare={})
-> void
{
auto buffer_end = std::swap_ranges(first, middle, buffer);
half_inplace_merge(buffer, buffer_end,
middle, last,
first, compare);
}Note that dropping the reverse iterators lowered the iterator requirements of the function to forward iterators, even though the overall
quick_merge_sort still requires random-access iterators because of std::nth_element.Dropping
std::not_fn also means that the code doesn't use any C++17 feature anymore and works out-of-the-box with a C++14 compiler. Sure, it could be made to work with even C++03, but that's not the point :)Avoid a gratuitous albeit small pessimization when sorting an
std::dequeThe following line will likely perform a few extra operations if we try to sort an
std::deque with quickmergesort:auto pivot = first + 2 * (size / 3) - 2;Since
operator+ is left-associative, the statement above is equivalent to the following one:auto pivot = (first + 2 * (size / 3)) - 2;When
first is an std::deque iterator, it is first incremented by first + 2 (size / 3), then decremented by 2. Considering that operator+ and operator- are not trivial for std::deque iterators, computing 2 (size / 3) - 2 then incrementing first is likely more efficient. We can thus avoid this small albeit gratuitous pessimization by transforming the statement as follows:auto pivot = first + (2 * (size / 3) - 2);In this specific algorithm, it should not make a noticeable difference, but some algorithms can become dramatically faster with
std::deque (and other random-access containers with not-trivial logic) when making sure that we are avoiding such pessimizations.Code Snippets
template<
typename InputIterator,
typename OutputIterator,
typename Compare = std::less<>
>
auto half_inplace_merge(InputIterator first1, InputIterator last1,
InputIterator first2, InputIterator last2,
OutputIterator result, Compare compare={})
-> void
{
// ...
}template<
typename ForwardIterator,
typename Compare = std::less<>
>
auto buffered_inplace_merge(ForwardIterator first, ForwardIterator middle,
ForwardIterator last, ForwardIterator buffer,
Compare compare={})
-> void
{
auto buffer_end = std::swap_ranges(first, middle, buffer);
half_inplace_merge(buffer, buffer_end,
middle, last,
first, compare);
}auto pivot = first + 2 * (size / 3) - 2;auto pivot = (first + 2 * (size / 3)) - 2;auto pivot = first + (2 * (size / 3) - 2);Context
StackExchange Code Review Q#149443, answer score: 4
Revisions (0)
No revisions yet.