snippetcppMinor

Parallel Merge Sort

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

sortparallelmerge

Problem

Parallism is achieved by using a simple call to tbb::parallel_invoke(). However I can't break a speedup factor of x2 (after ~3-4 parallel HW threads), although 8 parallel HW threads are used. Advice on this is highly appreciated.

template
void insertion_sort(In b, In e) { // pretty standard insertion sort for smaller pieces
    for(auto i = b; i != e; ++i) {
        auto key = *i;
        auto j = i - 1;

        for(; j != b - 1 && *j > key; --j)
            *(j + 1) = *j;

        *(j + 1) = key;
    }
}

template 
void merge(In b, In m, In e) { // this is the merger, doing the important job
  std::vector left(b, m);
  std::vector right(m, e);

  // guards for the two ranges
  left.push_back(std::numeric_limits::max());
  right.push_back(std::numeric_limits::max());

  auto itl = std::begin(left);
  auto itr = std::begin(right);

  while (b != e) {
    if(*itl 
void merge_sort(In b, In e) { // serial merge_sort, used for pieces smaller  10) {
            merge_sort(b, m);
            merge_sort(m, e);
            merge(b, m, e);
        }
        //switch to insertion sort for pieces 
void merge_sort_parallel(In b, In e) { // merge_sort parallel 
  if(b != e - 1) {
    auto dis = std::distance(b, e);
    In m = b + dis / 2;
        if(dis > 500) {
            tbb::parallel_invoke([&] () { merge_sort_parallel(b, m); },
                                 [&] () { merge_sort_parallel(m, e); });
        }
        else {
            merge_sort(b, m);
            merge_sort(m, e);
        }
        merge(b, m, e);
  }
}

If something is not understandable, please ask.

Solution

It's due to the algorithm itself. After all the stuff you do in parallel, you call merge in serial.

merge(b, m, e);

No matter how fast you sort the fragments, there's one big serial merge at the end that only uses one core. In addition, halves get merged on only 2 cores, quarters get merged on only 4 cores, etc.

See this question on Stack Overflow for more detail:

https://stackoverflow.com/questions/3969813/which-parallel-sorting-algorithm-has-the-best-average-case-performance

Code Snippets

merge(b, m, e);

Context

StackExchange Code Review Q#7684, answer score: 6

Revisions (0)

No revisions yet.