HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Parallelizing an algorithm with OpenMP using a dynamic work queue

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
parallelizingwithopenmpalgorithmusingworkdynamicqueue

Problem

I'm looking for comments on the design, correctness and performance (not so much style) of a dynamic work queue for OpenMP worker threads.

I have an algorithm that can be thought of in terms of some number of work items (how many is not known beforehand), each of which is executed and may then spawn two child work items. I start with a root work item and continue until there are no more tasks, then terminate. Tasks that are not ancestor and descendant are independent (and therefore can be run in parallel).

I want to use OpenMP to parallelize the work by having threads pop work from a queue. The difficult bit is:

  • waiting for work to become available in an efficient way



  • terminating when there are no work items left, and new ones can't be created any more, but not before



I'm most concerned about the correctness of the termination condition and of my use of the OpenMP functions.

In the version below, the work items are dummies, but the idea is that you could drop in arbitrary work (as long as mutually non-ancestral work is independent).

```
#include
#include
#include

#include "omp.h"

using namespace std;

// Dummy nlogn algorithm work item
class Work {
public:
Work() : n() {}
Work(int n) : n(n) {}

// Returns true iff child work created
bool go(Work& childA, Work& childB) {
if (n > 1) {
childA = Work(n - 1);
childB = Work(n - 1);
return true;
}
return false;
}

int n;
};

int main() {
Work rootWorkItem(20);

int numWaiting = 0;
std::mutex m;
std::condition_variable cv;
std::stack workStack;

workStack.push(rootWorkItem);

#pragma omp parallel
while (true) {
std::unique_lock lock(m);

if (workStack.empty()) {
// Register as waiting for work
numWaiting++;

// If everybody is now waiting, it means that no one has more work to push
if (numWaiting >= omp_get_num_threads()) {

Solution

Firstly, I know you said you're not so interested in a style review, but:

using namespace std;


Apart from being harmful, this isn't even bringing any benefit. We can just drop it with no changes to the rest of the code.

And if OpenMP is correctly installed, the header should be on the include path (not in your source code location):

#include 


The only thing we use from ` is omp_get_num_threads(); we can avoid needing that if we change the sense of the numWaiting variable, so we get to compare against zero instead:

unsigned int num_working = 0;


#pragma omp parallel
{
#pragma omp single
    ++num_working;

    while (true) {
        std::unique_lock lock(m);

        if (workStack.empty()) {
            --num_working;

            if (num_working == 0) {
                cv.notify_all();
                break;
            }

            while (true) {
                cv.wait(lock);
                if (num_working == 0 || !workStack.empty()) break;
            }

            if (num_working == 0) break;

            ++num_working;
        }


Instead of the
#pragma omp single, we could make num_working be a std::atomic instead.

When we add two items to the work queue, we only wake up one worker thread. We need to wake up a thread for each child, potentially:

workStack.push(childA);
        cv.notify_one();
        workStack.push(childB);
        cv.notify_one();


Possibly a style issue, but the
while (true) loop looks better as a do ... while`:

do {
                cv.wait(lock);
            } while (num_working > 0 && workStack.empty());

Code Snippets

using namespace std;
#include <omp.h>
unsigned int num_working = 0;
#pragma omp parallel
{
#pragma omp single
    ++num_working;

    while (true) {
        std::unique_lock<std::mutex> lock(m);

        if (workStack.empty()) {
            --num_working;

            if (num_working == 0) {
                cv.notify_all();
                break;
            }

            while (true) {
                cv.wait(lock);
                if (num_working == 0 || !workStack.empty()) break;
            }

            if (num_working == 0) break;

            ++num_working;
        }
workStack.push(childA);
        cv.notify_one();
        workStack.push(childB);
        cv.notify_one();

Context

StackExchange Code Review Q#118982, answer score: 2

Revisions (0)

No revisions yet.