HiveBrain v1.2.0
Get Started
← Back to all entries
snippetcppModerate

Merge sort algorithm implementation using C++

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
implementationmergealgorithmusingsort

Problem

I'm trying to learn proper C++ and algorithms at the same time.

I particularly feel weird about my iterators usage in merge function. Is this a good way to handle it? I've modeled the signature after the STL std::merge method in ``.

I've tagged it as C++11 as well, because ideally I would've liked to make use of the new features like move semantics and rvalues, but it seems that I'm misunderstanding how they're supposed to work - no matter where I try to use them, my running time actually drops...

#include 

typedef std::vector::iterator vec_it;
void merge(vec_it left, vec_it left_end, vec_it right, vec_it right_end, vec_it numbers)
{
    while (left != left_end) {
        if (*left & numbers)
{
    if (numbers.size() ::size_type middle = numbers.size() / 2;
    std::vector left(numbers.begin(), numbers.begin() + middle);
    std::vector right(numbers.begin() + middle, numbers.end());

    merge_sort(left);
    merge_sort(right);

    merge(left.begin(), left.end(), right.begin(), right.end(), numbers.begin());
}

Solution

Looks good.

Couple of things I would do differently (not that your way is wrong).

Rather than pass references to the containers around I would pass iterators into the containers. This allows your sort algorithm to be container agnostic:

void merge_sort(std::vector& numbers)
{}

// My version looks like this

template                 // Notice the template
void merge_sort(I begin, I end)      // Just means I don't care what type
{}                                   // of iterator is used.


Same applies to merge().

Rather than creating sub arrays in merge_sort() I would do it inside merge(). With your current implementation you have a space complexity of \$O(N^2)\$. If you do it inside the merge you just need to allocate enough space to merge the current two ranges which is at most \$O(2N)\$ => \$O(N)\$.

Where you have:

std::vector::size_type middle = numbers.size() / 2;
std::vector left(numbers.begin(), numbers.begin() + middle);
std::vector right(numbers.begin() + middle, numbers.end());

merge_sort(left);
merge_sort(right);

// My version looks like this:

std::size_t mid      = length/2;
I           midPoint = std::next(begin, mid);

// Merge in place.
mergeSort(begin, midPoint);
mergeSort(midPoint, end);


I could not work out how to merge without using a temporary (and I don;t have my copy of Knuth here). So my version of merge() merges the two sorted sub vectors into a temporary then copies back over the original.

Looking at your merge code it's slightly hard to follow (but I groked it). I personally prefer a simpler version.

// In this loop:
//    l:     current position in left  sub-array
//    r:     current position in right sub-array
//    i:     current position into merged array.
//    Note because we are merging in-place.
//    begin/midPoint/end are iterators to the input arrays that
//    split it into two parts.
while(l < midPoint && r < end)
{
    if (*l < *r)
    {    *i = *l;
         ++i;
         ++l;
    }
    else
    {    *i = *r;
         ++i;
         ++r;
    }
}
// One of the ranges is empty at this point.
// So only one of the loops will execute.
while(l < midPoint)
{   *i  = *l;
    ++i;
    ++l;
}
while(r < end)
{   *i  = *r;
    ++i;
    ++r;
}


A slight variation on this that I use; Where you use if () {} else {} I prefer to use the Condition Operator => Test ? : . I also have done this a few times and can safely compress ++ operations onto the same lines (Note I don't compress all of them; this is just personal preferences as I think it makes it easier to read this way). Which leaves me with:

while(l < midPoint && r < end)
{
    *i = std::move((*l < *r) ? *l++ : *r++);
    ++i;
}
while(l < midPoint)
{   *i  = std::move(*l++);
    ++i;
}
while(r < end)
{   *i  = std::move(*r++);
    ++i;
}


Notice: I use std::move() here. This is because my sort works on generic containers (not just integer containers). So I may be sorting an array of strings.

Final result is:

#include 
#include 
#include 
#include 

template
void doMerge(I begin, I midPoint, I end)
{
    typename std::vector::value_type> TmpVec;

    TmpVec  tmp(std::make_move_iterator(begin), std::make_move_iterator(end));

    TmpVec::iterator   beginAlt   = std::begin(tmp);
    TmpVec::iterator   endAlt     = std::end(tmp);
    TmpVec::iterator   midAlt     = std::next(beginAlt, std::distance(begin, midPoint));

    TmpVec::iterator   l   = beginAlt
    TmpVec::iterator   r   = midAlt;
    I                  i   = begin;

    while(l 
void mergeSort(I begin, I end)
{
    std::size_t length  = std::distance(begin, end);
    if (length     data  {{ 5,12,45,2,67,8}};
    mergeSort(std::begin(data), std::end(data));

    std::copy(std::begin(data), std::end(data), std::ostream_iterator(std::cout, ", "));
    std::cout << "\n";
}

Code Snippets

void merge_sort(std::vector<int>& numbers)
{}

// My version looks like this

template<typename I>                 // Notice the template
void merge_sort(I begin, I end)      // Just means I don't care what type
{}                                   // of iterator is used.
std::vector<int>::size_type middle = numbers.size() / 2;
std::vector<int> left(numbers.begin(), numbers.begin() + middle);
std::vector<int> right(numbers.begin() + middle, numbers.end());

merge_sort(left);
merge_sort(right);

// My version looks like this:

std::size_t mid      = length/2;
I           midPoint = std::next(begin, mid);

// Merge in place.
mergeSort(begin, midPoint);
mergeSort(midPoint, end);
// In this loop:
//    l:     current position in left  sub-array
//    r:     current position in right sub-array
//    i:     current position into merged array.
//    Note because we are merging in-place.
//    begin/midPoint/end are iterators to the input arrays that
//    split it into two parts.
while(l < midPoint && r < end)
{
    if (*l < *r)
    {    *i = *l;
         ++i;
         ++l;
    }
    else
    {    *i = *r;
         ++i;
         ++r;
    }
}
// One of the ranges is empty at this point.
// So only one of the loops will execute.
while(l < midPoint)
{   *i  = *l;
    ++i;
    ++l;
}
while(r < end)
{   *i  = *r;
    ++i;
    ++r;
}
while(l < midPoint && r < end)
{
    *i = std::move((*l < *r) ? *l++ : *r++);
    ++i;
}
while(l < midPoint)
{   *i  = std::move(*l++);
    ++i;
}
while(r < end)
{   *i  = std::move(*r++);
    ++i;
}
#include <vector>
#include <iostream>
#include <algorithm>
#include <iterator>

template<typename I>
void doMerge(I begin, I midPoint, I end)
{
    typename std::vector<typename std::iterator_traits<I>::value_type> TmpVec;

    TmpVec  tmp(std::make_move_iterator(begin), std::make_move_iterator(end));

    TmpVec::iterator   beginAlt   = std::begin(tmp);
    TmpVec::iterator   endAlt     = std::end(tmp);
    TmpVec::iterator   midAlt     = std::next(beginAlt, std::distance(begin, midPoint));


    TmpVec::iterator   l   = beginAlt
    TmpVec::iterator   r   = midAlt;
    I                  i   = begin;

    while(l < midAlt && r < endAlt)
    {
        *i = std::move((*l < *r) ? *l++ : *r++);
        ++i;
    }
    while(l < midAlt)
    {   *i  = std::move(*l++);
        ++i;
    }
    while(r < endAlt)
    {   *i  = std::move(*r++);
        ++i;
    }
}
template<typename I>
void mergeSort(I begin, I end)
{
    std::size_t length  = std::distance(begin, end);
    if (length <= 1)
    {   return;
    }

    std::size_t mid      = length/2;
    I           midPoint = std::next(begin, mid);

    mergeSort(begin, midPoint);
    mergeSort(midPoint, end);

    doMerge(begin, midPoint, end);
}


int main()
{
    std::vector<int>    data  {{ 5,12,45,2,67,8}};
    mergeSort(std::begin(data), std::end(data));

    std::copy(std::begin(data), std::end(data), std::ostream_iterator<int>(std::cout, ", "));
    std::cout << "\n";
}

Context

StackExchange Code Review Q#49459, answer score: 13

Revisions (0)

No revisions yet.