snippetcppMinor
Parallel Merge Sort
Viewed 0 times
sortparallelmerge
Problem
Parallism is achieved by using a simple call to
If something is not understandable, please ask.
tbb::parallel_invoke(). However I can't break a speedup factor of x2 (after ~3-4 parallel HW threads), although 8 parallel HW threads are used. Advice on this is highly appreciated.template
void insertion_sort(In b, In e) { // pretty standard insertion sort for smaller pieces
for(auto i = b; i != e; ++i) {
auto key = *i;
auto j = i - 1;
for(; j != b - 1 && *j > key; --j)
*(j + 1) = *j;
*(j + 1) = key;
}
}
template
void merge(In b, In m, In e) { // this is the merger, doing the important job
std::vector left(b, m);
std::vector right(m, e);
// guards for the two ranges
left.push_back(std::numeric_limits::max());
right.push_back(std::numeric_limits::max());
auto itl = std::begin(left);
auto itr = std::begin(right);
while (b != e) {
if(*itl
void merge_sort(In b, In e) { // serial merge_sort, used for pieces smaller 10) {
merge_sort(b, m);
merge_sort(m, e);
merge(b, m, e);
}
//switch to insertion sort for pieces
void merge_sort_parallel(In b, In e) { // merge_sort parallel
if(b != e - 1) {
auto dis = std::distance(b, e);
In m = b + dis / 2;
if(dis > 500) {
tbb::parallel_invoke([&] () { merge_sort_parallel(b, m); },
[&] () { merge_sort_parallel(m, e); });
}
else {
merge_sort(b, m);
merge_sort(m, e);
}
merge(b, m, e);
}
}If something is not understandable, please ask.
Solution
It's due to the algorithm itself. After all the stuff you do in parallel, you call merge in serial.
No matter how fast you sort the fragments, there's one big serial merge at the end that only uses one core. In addition, halves get merged on only 2 cores, quarters get merged on only 4 cores, etc.
See this question on Stack Overflow for more detail:
https://stackoverflow.com/questions/3969813/which-parallel-sorting-algorithm-has-the-best-average-case-performance
merge(b, m, e);No matter how fast you sort the fragments, there's one big serial merge at the end that only uses one core. In addition, halves get merged on only 2 cores, quarters get merged on only 4 cores, etc.
See this question on Stack Overflow for more detail:
https://stackoverflow.com/questions/3969813/which-parallel-sorting-algorithm-has-the-best-average-case-performance
Code Snippets
merge(b, m, e);Context
StackExchange Code Review Q#7684, answer score: 6
Revisions (0)
No revisions yet.