HiveBrain v1.2.0
Get Started
← Back to all entries
snippetcppMinor

Multi-threaded sort

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
sortthreadedmulti

Problem

I wanted to write a multi-threaded sort, but unfortunately I don't know much about threads, especially in C++11. I managed to make something work, but I would be highly surprised if it was the best way to do it.

template
void ___sort(T *data, int len, int nbThreads){
    if(nbThreads());
    }
    else{

        std::future b = std::async(std::launch::async,___sort, data, len/2, nbThreads-2);
        std::future a = std::async(std::launch::async,___sort, &data[len/2], len/2, nbThreads-2);

        a.wait();
        b.wait();

        std::inplace_merge (data,&data[len/2],&data[len],std::less());
    }
}


Some of the tests I did sorting integers:

size is the number of ints sorted, and the time is in seconds.

size : 2097152
time with 1 thread : 4.961
time with 2 thread : 3.191
time with 4 thread : 2.377

size : 4194304
time with 1 thread : 10.002
time with 2 thread : 6.214
time with 4 thread : 4.689

size : 8388608
time with 1 thread : 19.975
time with 2 thread : 12.332
time with 4 thread : 9.29

size : 16777216
time with 1 thread : 38.712
time with 2 thread : 25.257
time with 4 thread : 18.706


Also, I tried using std::qsort instead of std::sort, and the results were at least twice as long as that. Any reasons why?

Solution

Something like this is probably more efficient:

template
void parallel_sort(T* data, int len, int grainsize)
{
    // Use grainsize instead of thread count so that we don't e.g.
    // spawn 4 threads just to sort 8 elements.
    if(len ());
    }
    else
    {
        auto future = std::async(parallel_sort, data, len/2, grainsize);

        // No need to spawn another thread just to block the calling 
        // thread which would do nothing.
        parallel_sort(data + len/2, len - len/2, grainsize);

        future.wait();

        std::inplace_merge(data, data + len/2, data + len, std::less());
    }
}


You could set grain size to something like, std::max(256, len/nb_threads).

Code Snippets

template<class T>
void parallel_sort(T* data, int len, int grainsize)
{
    // Use grainsize instead of thread count so that we don't e.g.
    // spawn 4 threads just to sort 8 elements.
    if(len < grainsize)
    {
        std::sort(data, data + len, std::less<T>());
    }
    else
    {
        auto future = std::async(parallel_sort<T>, data, len/2, grainsize);

        // No need to spawn another thread just to block the calling 
        // thread which would do nothing.
        parallel_sort(data + len/2, len - len/2, grainsize);

        future.wait();

        std::inplace_merge(data, data + len/2, data + len, std::less<T>());
    }
}

Context

StackExchange Code Review Q#22744, answer score: 8

Revisions (0)

No revisions yet.