snippetcppModerate
Ford-Johnson merge-insertion sort
Viewed 0 times
johnsoninsertionmergefordsort
Problem
The Ford-Johnson algorithm, also known as merge-insertion sort (the name was probably given by Knuth) is an in-place sorting algorithm designed to perform as few comparisons as possible to sort a collection. Unfortunately, the algorithm needs some specific data structures to keep track of the order of the elements and is generally too slow to be practical. Anyway, it is an interesting algorithm from a computer science point of view; while not performing an optimal number of comparisons, it's still a reference and one of the best known comparison sorts when it comes to reducing the number of comparisons.
The algorithm is not the simplest in the world and I couldn't find a proper explanation online, so I will try to explain it as I can based on the descriptions in The Art of Computer Programming, Volume 3 by Donald Knuth.
Making the best of binary search
To perform a minimal number of comparisons, we need to take into account the following observation about binary search: the maximal number of comparisons needed to perform a binary search on a sorted sequence is the same when the number of elements is \$2^n\$ and when it is \$2^{n+1}-1\$. For example, looking for an element in a sorted sequence of \$8\$ or \$15\$ elements requires the same number of comparisons.
Many insertion-based sorting algorithms perform binary searches to find where to insert elements, but most of them don't take that property of binary search into account.
The merge-insertion sort
The Ford-Johnson merge-insertion sort is a three-step algorithm, let \$n\$ be the number of elements to sort:
-
Split the collection in \$n/2\$ pairs of \$2\$ elements and order these elements pairwise. If the number of elements is odd, the last element of the collection isn't paired with any element.
-
Recursively sort the pairs of elements by their highest value. If the number of elements is odd, the last element is not considered a highest value and is left at the end of the collection. Consider that the hi
The algorithm is not the simplest in the world and I couldn't find a proper explanation online, so I will try to explain it as I can based on the descriptions in The Art of Computer Programming, Volume 3 by Donald Knuth.
Making the best of binary search
To perform a minimal number of comparisons, we need to take into account the following observation about binary search: the maximal number of comparisons needed to perform a binary search on a sorted sequence is the same when the number of elements is \$2^n\$ and when it is \$2^{n+1}-1\$. For example, looking for an element in a sorted sequence of \$8\$ or \$15\$ elements requires the same number of comparisons.
Many insertion-based sorting algorithms perform binary searches to find where to insert elements, but most of them don't take that property of binary search into account.
The merge-insertion sort
The Ford-Johnson merge-insertion sort is a three-step algorithm, let \$n\$ be the number of elements to sort:
-
Split the collection in \$n/2\$ pairs of \$2\$ elements and order these elements pairwise. If the number of elements is odd, the last element of the collection isn't paired with any element.
-
Recursively sort the pairs of elements by their highest value. If the number of elements is odd, the last element is not considered a highest value and is left at the end of the collection. Consider that the hi
Solution
First of all, there is an error
Not a fatal error though since the algorithm still sorts the collection properly, but it actually means that the algorithm doesn't perform as few comparisons as it should. The line
We don't need to remove elements from
Instead of erasing elements from
Use the original iterators
Since we only have to add a Jacobsthal diff number to find the next element of
We can insert the remaining elements in any order
At first I thought that the remaining
Smaller things
-
-
The recursion is needlessly complicated:
-
There is still a small error in
Putting it all together
Once we stick all these remarks together,
```
template
auto merge_insertion_sort_impl(RandomAccessIterator first, RandomAccessIterator last,
Compare compare)
{
// Cache all the differences between a Jacobsthal number and its
// predecessor that fit in 64 bits, starting with the difference
// between the Jacobsthal numbers 4 and 3 (the previous ones are
// unneeded)
static constexpr std::uint_fast64_t jacobsthal_diff[] = {
2u, 2u, 6u, 10u, 22u, 42u, 86u, 170u, 342u, 682u, 1366u,
2730u, 5462u, 10922u, 21846u, 43690u, 87382u, 174762u, 349526u, 699050u,
1398102u, 2796202u, 5592406u, 11184810u, 22369622u, 44739242u, 89478486u,
178956970u, 357913942u, 715827882u, 1431655766u, 2863311530u, 5726623062u,
11453246122u, 22906492246u, 45812984490u, 91625968982u, 183251937962u,
366503875926u, 733007751850u, 1466015503702u, 2932031007402u, 5864062014806u,
11728124029610u, 23456248059222u, 46912496118442u, 93824992236886u, 187649984473770u,
375299968947542u, 750599937895082u, 1501199875790165u, 3002399751580331u,
6004799503160661u, 12009599006321322u, 24019198012642644u, 48038396025285288u,
96076792050570576u, 192153584101141152u, 384307168202282304u, 768614336404564608u,
1537228672809129216u, 3074457345618258432u, 6148914691236516864u
};
using std::iter_swap;
auto size = std::distance(first, last);
if (size chain = { first, std::next(first) };
// Upper bounds for the insertion of pend
Not a fatal error though since the algorithm still sorts the collection properly, but it actually means that the algorithm doesn't perform as few comparisons as it should. The line
std::advance(it, dist); advances the iterator one step too far, so the binary insertion is sometimes done in a main chain too big compared to what it should be (more than \$2^n-1\$ elements). The obvious solution is to advance the iterator by dist - 1 instead of dist; however, removing 1 from every element in jacobsthal_diff is also a solution.We don't need to remove elements from
pendInstead of erasing elements from
pend once they have been inserted into chain, we can instead track the first used iterator in pend corresponding to a Jacobsthal diff, add the next Jacobsthal diff to find the next such iterator, and decrease that iterator until it encounters the previous remembered Jacobsthal diff iterator. Not having to remove the elements from pend means that we don't need to store nodes in a container supporting fast deletion from the beginning. Basically, since we don't remove anything, we can switch to an std::vector for pend.Use the original iterators
Since we only have to add a Jacobsthal diff number to find the next element of
pend, it means that we can perform the same operation on the original collection to find the iterator to insert into the main chain (every element with an even index in [first, last) is a pend iterator). It means that we can drop this information from pend and only store an std::vector::value_type> instead of an std::vector. The maximal size of the vector should be (size + 1) / 2 - 1 so we can directly reserve that amount of elements.We can insert the remaining elements in any order
At first I thought that the remaining
pend elements had to be inserted in reverse order (the elements left when the farthest pend element whose index corresponds to a Jacobsthal number has been inserted). However, it appears that we can insert them in any order thanks to the properties of binary search. Therefore, inserting them in ascending order shoud probably ease the CPU's task.Smaller things
-
iter_swap isn't right: the overload of iter_swap for group_iterator is designed to swap several group_iterator of different types, which isn't quite right. Not only does it look like it can cause problems, but apparently it can also cause ADL problems: in a more complex case, the compiler found the unqualified call to iter_swap ambiguous. The solution was to make iter_swap work only with group_iterators of the same type:template
auto iter_swap(group_iterator lhs, group_iterator rhs)
-> void
{
std::swap_ranges(lhs.base(), lhs.base() + lhs.size(), rhs.base());
}-
The recursion is needlessly complicated:
merge_insertion_sort_impl calls merge_insertion_sort which calls... merge_insertion_sort_impl without adding anything significant while it introduces yet another indirection. While it is likely to get elided by the compiler, making a direct recursive call of merge_insertion_sort_impl makes things easier for everyone.-
There is still a small error in
operator>= for group_iterator: the parenthesis after lhs.base are missing, which is likely to cause a compilation error if the function is ever called.Putting it all together
Once we stick all these remarks together,
merge_insertion_sort_impl looks like this:```
template
auto merge_insertion_sort_impl(RandomAccessIterator first, RandomAccessIterator last,
Compare compare)
{
// Cache all the differences between a Jacobsthal number and its
// predecessor that fit in 64 bits, starting with the difference
// between the Jacobsthal numbers 4 and 3 (the previous ones are
// unneeded)
static constexpr std::uint_fast64_t jacobsthal_diff[] = {
2u, 2u, 6u, 10u, 22u, 42u, 86u, 170u, 342u, 682u, 1366u,
2730u, 5462u, 10922u, 21846u, 43690u, 87382u, 174762u, 349526u, 699050u,
1398102u, 2796202u, 5592406u, 11184810u, 22369622u, 44739242u, 89478486u,
178956970u, 357913942u, 715827882u, 1431655766u, 2863311530u, 5726623062u,
11453246122u, 22906492246u, 45812984490u, 91625968982u, 183251937962u,
366503875926u, 733007751850u, 1466015503702u, 2932031007402u, 5864062014806u,
11728124029610u, 23456248059222u, 46912496118442u, 93824992236886u, 187649984473770u,
375299968947542u, 750599937895082u, 1501199875790165u, 3002399751580331u,
6004799503160661u, 12009599006321322u, 24019198012642644u, 48038396025285288u,
96076792050570576u, 192153584101141152u, 384307168202282304u, 768614336404564608u,
1537228672809129216u, 3074457345618258432u, 6148914691236516864u
};
using std::iter_swap;
auto size = std::distance(first, last);
if (size chain = { first, std::next(first) };
// Upper bounds for the insertion of pend
Code Snippets
template<typename Iterator>
auto iter_swap(group_iterator<Iterator> lhs, group_iterator<Iterator> rhs)
-> void
{
std::swap_ranges(lhs.base(), lhs.base() + lhs.size(), rhs.base());
}template<
typename RandomAccessIterator,
typename Compare
>
auto merge_insertion_sort_impl(RandomAccessIterator first, RandomAccessIterator last,
Compare compare)
{
// Cache all the differences between a Jacobsthal number and its
// predecessor that fit in 64 bits, starting with the difference
// between the Jacobsthal numbers 4 and 3 (the previous ones are
// unneeded)
static constexpr std::uint_fast64_t jacobsthal_diff[] = {
2u, 2u, 6u, 10u, 22u, 42u, 86u, 170u, 342u, 682u, 1366u,
2730u, 5462u, 10922u, 21846u, 43690u, 87382u, 174762u, 349526u, 699050u,
1398102u, 2796202u, 5592406u, 11184810u, 22369622u, 44739242u, 89478486u,
178956970u, 357913942u, 715827882u, 1431655766u, 2863311530u, 5726623062u,
11453246122u, 22906492246u, 45812984490u, 91625968982u, 183251937962u,
366503875926u, 733007751850u, 1466015503702u, 2932031007402u, 5864062014806u,
11728124029610u, 23456248059222u, 46912496118442u, 93824992236886u, 187649984473770u,
375299968947542u, 750599937895082u, 1501199875790165u, 3002399751580331u,
6004799503160661u, 12009599006321322u, 24019198012642644u, 48038396025285288u,
96076792050570576u, 192153584101141152u, 384307168202282304u, 768614336404564608u,
1537228672809129216u, 3074457345618258432u, 6148914691236516864u
};
using std::iter_swap;
auto size = std::distance(first, last);
if (size < 2) return;
// Whether there is a stray element not in a pair
// at the end of the chain
bool has_stray = (size % 2 != 0);
////////////////////////////////////////////////////////////
// Group elements by pairs
auto end = has_stray ? std::prev(last) : last;
for (auto it = first ; it != end ; it += 2)
{
if (compare(it[1], it[0]))
{
iter_swap(it, it + 1);
}
}
////////////////////////////////////////////////////////////
// Recursively sort the pairs by max
merge_insertion_sort_impl(
make_group_iterator(first, 2),
make_group_iterator(end, 2),
compare
);
////////////////////////////////////////////////////////////
// Separate main chain and pend elements
// The first pend element is always part of the main chain,
// so we can safely initialize the list with the first two
// elements of the sequence
std::list<RandomAccessIterator> chain = { first, std::next(first) };
// Upper bounds for the insertion of pend elements
std::vector<typename std::list<RandomAccessIterator>::iterator> pend;
pend.reserve((size + 1) / 2 - 1);
for (auto it = first + 2 ; it != end ; it += 2)
{
auto tmp = chain.insert(std::end(chain), std::next(it));
pend.push_back(tmp);
}
// Add the last element to pend if it exists; when it
// exists, it always has to be inserted in the full chain,
// so giving it chain.end() as end insertionContext
StackExchange Code Review Q#116367, answer score: 12
Revisions (0)
No revisions yet.