patterncppModerate
MergeSort performance
Viewed 0 times
performancemergesortstackoverflow
Problem
Do you see any inefficient parts with the following code? On internet I see 30 ms for 100.000 integers but I can sort 100.000 integers in 300ms. The data is random. I am using late 2013 macbook pro so I don't expect a 10x slowdown from CPU, but who knows.
void mergesort2(int* list, int begin, int end,int *tmplist=0){
bool allocated = false;
if (end-begin middle) {
memcpy(tmplist+tmp, list+second, sizeof(int)*(end-begin-tmp+1));
}
else if (second > end) {
memcpy(tmplist+tmp, list+first, sizeof(int)*(end-begin-tmp+1));
}
else if (list[first] <= list[second]) {
tmplist[tmp] = list[first++];
}else{
tmplist[tmp] = list[second++];
}
}
memcpy(list+begin, tmplist, (end-begin+1)*sizeof(int));
if (allocated) {
delete tmplist;
}
}Solution
A C++ merge sort in the style of a Standard Library algorithm could be written as
Let's look at how this improves upon your design:
template::value_type>>
void merge_sort(BiDirIt first, BiDirIt last, Compare cmp = Compare())
{
auto const N = std::distance(first, last);
if (N < 2) return;
auto middle = std::next(first, N / 2);
merge_sort(first, middle, cmp);
merge_sort(middle, last, cmp);
std::inplace_merge(first, middle, last, cmp);
}Let's look at how this improves upon your design:
- generic interface: it takes two iterators and a comparison function of arbitrary types, instead of two indices into a pointer to an array of
intand the implicit but fixed
- in-place: behind the covers it calls std::inplace_merge
that can allocate extra memory but it does it so that users don't have to supply this, instead of yourtmplistparameter.
- half-open intervals: it will split the input range [first, last)
into[first, middle)and[middle, last)which automically takes care of easy to make "off-by-one" errors in code likemiddle + 1(in your code it is correct, but you still have to think about it, half-open intervals are easier to get right).
- named algorithms: it uses std::inplace_merge
as a ready-to-use component instead of a somewhat long and certainly tricky final step in your algorithm. Not only makes this yourmerge_sortmore compact, theinplace_mergeitself is a resuable component in its own right!
- heavily optimized: Standard Library experts are very likely to know about the state of the art in writing performance-optimized algorithms. A factor of 10 of performance compared to your code seems a lot, but if you forget some hidden copies, use a hand-written loop that has O(N^2)
complexity instead ofO(N log N)` you can quickly get there.
Code Snippets
template<class BiDirIt, class Compare = std::less<typename std::iterator_traits<BiDirIt>::value_type>>
void merge_sort(BiDirIt first, BiDirIt last, Compare cmp = Compare())
{
auto const N = std::distance(first, last);
if (N < 2) return;
auto middle = std::next(first, N / 2);
merge_sort(first, middle, cmp);
merge_sort(middle, last, cmp);
std::inplace_merge(first, middle, last, cmp);
}Context
StackExchange Code Review Q#43138, answer score: 14
Revisions (0)
No revisions yet.