HiveBrain v1.2.0
Get Started
← Back to all entries
snippetcppModerate

Ford-Johnson merge-insertion sort

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
johnsoninsertionmergefordsort

Problem

The Ford-Johnson algorithm, also known as merge-insertion sort (the name was probably given by Knuth) is an in-place sorting algorithm designed to perform as few comparisons as possible to sort a collection. Unfortunately, the algorithm needs some specific data structures to keep track of the order of the elements and is generally too slow to be practical. Anyway, it is an interesting algorithm from a computer science point of view; while not performing an optimal number of comparisons, it's still a reference and one of the best known comparison sorts when it comes to reducing the number of comparisons.

The algorithm is not the simplest in the world and I couldn't find a proper explanation online, so I will try to explain it as I can based on the descriptions in The Art of Computer Programming, Volume 3 by Donald Knuth.
Making the best of binary search

To perform a minimal number of comparisons, we need to take into account the following observation about binary search: the maximal number of comparisons needed to perform a binary search on a sorted sequence is the same when the number of elements is \$2^n\$ and when it is \$2^{n+1}-1\$. For example, looking for an element in a sorted sequence of \$8\$ or \$15\$ elements requires the same number of comparisons.

Many insertion-based sorting algorithms perform binary searches to find where to insert elements, but most of them don't take that property of binary search into account.
The merge-insertion sort

The Ford-Johnson merge-insertion sort is a three-step algorithm, let \$n\$ be the number of elements to sort:

-
Split the collection in \$n/2\$ pairs of \$2\$ elements and order these elements pairwise. If the number of elements is odd, the last element of the collection isn't paired with any element.

-
Recursively sort the pairs of elements by their highest value. If the number of elements is odd, the last element is not considered a highest value and is left at the end of the collection. Consider that the hi

Solution

First of all, there is an error

Not a fatal error though since the algorithm still sorts the collection properly, but it actually means that the algorithm doesn't perform as few comparisons as it should. The line std::advance(it, dist); advances the iterator one step too far, so the binary insertion is sometimes done in a main chain too big compared to what it should be (more than \$2^n-1\$ elements). The obvious solution is to advance the iterator by dist - 1 instead of dist; however, removing 1 from every element in jacobsthal_diff is also a solution.

We don't need to remove elements from pend

Instead of erasing elements from pend once they have been inserted into chain, we can instead track the first used iterator in pend corresponding to a Jacobsthal diff, add the next Jacobsthal diff to find the next such iterator, and decrease that iterator until it encounters the previous remembered Jacobsthal diff iterator. Not having to remove the elements from pend means that we don't need to store nodes in a container supporting fast deletion from the beginning. Basically, since we don't remove anything, we can switch to an std::vector for pend.

Use the original iterators

Since we only have to add a Jacobsthal diff number to find the next element of pend, it means that we can perform the same operation on the original collection to find the iterator to insert into the main chain (every element with an even index in [first, last) is a pend iterator). It means that we can drop this information from pend and only store an std::vector::value_type> instead of an std::vector. The maximal size of the vector should be (size + 1) / 2 - 1 so we can directly reserve that amount of elements.

We can insert the remaining elements in any order

At first I thought that the remaining pend elements had to be inserted in reverse order (the elements left when the farthest pend element whose index corresponds to a Jacobsthal number has been inserted). However, it appears that we can insert them in any order thanks to the properties of binary search. Therefore, inserting them in ascending order shoud probably ease the CPU's task.

Smaller things

-
iter_swap isn't right: the overload of iter_swap for group_iterator is designed to swap several group_iterator of different types, which isn't quite right. Not only does it look like it can cause problems, but apparently it can also cause ADL problems: in a more complex case, the compiler found the unqualified call to iter_swap ambiguous. The solution was to make iter_swap work only with group_iterators of the same type:

template
auto iter_swap(group_iterator lhs, group_iterator rhs)
    -> void
{
    std::swap_ranges(lhs.base(), lhs.base() + lhs.size(), rhs.base());
}


-
The recursion is needlessly complicated: merge_insertion_sort_impl calls merge_insertion_sort which calls... merge_insertion_sort_impl without adding anything significant while it introduces yet another indirection. While it is likely to get elided by the compiler, making a direct recursive call of merge_insertion_sort_impl makes things easier for everyone.

-
There is still a small error in operator>= for group_iterator: the parenthesis after lhs.base are missing, which is likely to cause a compilation error if the function is ever called.

Putting it all together

Once we stick all these remarks together, merge_insertion_sort_impl looks like this:

```
template
auto merge_insertion_sort_impl(RandomAccessIterator first, RandomAccessIterator last,
Compare compare)
{
// Cache all the differences between a Jacobsthal number and its
// predecessor that fit in 64 bits, starting with the difference
// between the Jacobsthal numbers 4 and 3 (the previous ones are
// unneeded)
static constexpr std::uint_fast64_t jacobsthal_diff[] = {
2u, 2u, 6u, 10u, 22u, 42u, 86u, 170u, 342u, 682u, 1366u,
2730u, 5462u, 10922u, 21846u, 43690u, 87382u, 174762u, 349526u, 699050u,
1398102u, 2796202u, 5592406u, 11184810u, 22369622u, 44739242u, 89478486u,
178956970u, 357913942u, 715827882u, 1431655766u, 2863311530u, 5726623062u,
11453246122u, 22906492246u, 45812984490u, 91625968982u, 183251937962u,
366503875926u, 733007751850u, 1466015503702u, 2932031007402u, 5864062014806u,
11728124029610u, 23456248059222u, 46912496118442u, 93824992236886u, 187649984473770u,
375299968947542u, 750599937895082u, 1501199875790165u, 3002399751580331u,
6004799503160661u, 12009599006321322u, 24019198012642644u, 48038396025285288u,
96076792050570576u, 192153584101141152u, 384307168202282304u, 768614336404564608u,
1537228672809129216u, 3074457345618258432u, 6148914691236516864u
};

using std::iter_swap;

auto size = std::distance(first, last);
if (size chain = { first, std::next(first) };

// Upper bounds for the insertion of pend

Code Snippets

template<typename Iterator>
auto iter_swap(group_iterator<Iterator> lhs, group_iterator<Iterator> rhs)
    -> void
{
    std::swap_ranges(lhs.base(), lhs.base() + lhs.size(), rhs.base());
}
template<
    typename RandomAccessIterator,
    typename Compare
>
auto merge_insertion_sort_impl(RandomAccessIterator first, RandomAccessIterator last,
                               Compare compare)
{
    // Cache all the differences between a Jacobsthal number and its
    // predecessor that fit in 64 bits, starting with the difference
    // between the Jacobsthal numbers 4 and 3 (the previous ones are
    // unneeded)
    static constexpr std::uint_fast64_t jacobsthal_diff[] = {
        2u, 2u, 6u, 10u, 22u, 42u, 86u, 170u, 342u, 682u, 1366u,
        2730u, 5462u, 10922u, 21846u, 43690u, 87382u, 174762u, 349526u, 699050u,
        1398102u, 2796202u, 5592406u, 11184810u, 22369622u, 44739242u, 89478486u,
        178956970u, 357913942u, 715827882u, 1431655766u, 2863311530u, 5726623062u,
        11453246122u, 22906492246u, 45812984490u, 91625968982u, 183251937962u,
        366503875926u, 733007751850u, 1466015503702u, 2932031007402u, 5864062014806u,
        11728124029610u, 23456248059222u, 46912496118442u, 93824992236886u, 187649984473770u,
        375299968947542u, 750599937895082u, 1501199875790165u, 3002399751580331u,
        6004799503160661u, 12009599006321322u, 24019198012642644u, 48038396025285288u,
        96076792050570576u, 192153584101141152u, 384307168202282304u, 768614336404564608u,
        1537228672809129216u, 3074457345618258432u, 6148914691236516864u
    };

    using std::iter_swap;

    auto size = std::distance(first, last);
    if (size < 2) return;

    // Whether there is a stray element not in a pair
    // at the end of the chain
    bool has_stray = (size % 2 != 0);

    ////////////////////////////////////////////////////////////
    // Group elements by pairs

    auto end = has_stray ? std::prev(last) : last;
    for (auto it = first ; it != end ; it += 2)
    {
        if (compare(it[1], it[0]))
        {
            iter_swap(it, it + 1);
        }
    }

    ////////////////////////////////////////////////////////////
    // Recursively sort the pairs by max

    merge_insertion_sort_impl(
        make_group_iterator(first, 2),
        make_group_iterator(end, 2),
        compare
    );

    ////////////////////////////////////////////////////////////
    // Separate main chain and pend elements

    // The first pend element is always part of the main chain,
    // so we can safely initialize the list with the first two
    // elements of the sequence
    std::list<RandomAccessIterator> chain = { first, std::next(first) };

    // Upper bounds for the insertion of pend elements
    std::vector<typename std::list<RandomAccessIterator>::iterator> pend;
    pend.reserve((size + 1) / 2 - 1);

    for (auto it = first + 2 ; it != end ; it += 2)
    {
        auto tmp = chain.insert(std::end(chain), std::next(it));
        pend.push_back(tmp);
    }

    // Add the last element to pend if it exists; when it
    // exists, it always has to be inserted in the full chain,
    // so giving it chain.end() as end insertion

Context

StackExchange Code Review Q#116367, answer score: 12

Revisions (0)

No revisions yet.