HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Task scheduler with dependencies

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
withschedulertaskdependencies

Problem

I have need for a task scheduler determined by a directed graph. The tasks are held in a std::vector, while the dependency graph is held in a adjacency_list (bidirectionalS so that I have access to the in_degree() function). A single dispatch will start the tasks. Are there obvious improvements here?

To set up a task dependency graph as below, use the driver provided.

scheduler_driver.hpp:

#ifndef __SCHEDULER_DRIVER_HPP__
#define __SCHEDULER_DRIVER_HPP__

#include 
#include 
#include 
#include 
#include 

#include "scheduler.h"

#endif


scheduler_driver.cpp:

#include "scheduler_driver.hpp"

enum task_nodes
  {
    task_0,
    task_1,
    task_2,
    task_3,
    task_4,
    task_5,
    task_6,
    task_7,
    task_8,
    task_9,
    N
  };

int basic_task(int a, int d)
{
  std::chrono::milliseconds sleepDuration(d);
  std::this_thread::sleep_for(sleepDuration);
  std::cout  F;

  Graph deps(N);
  boost::add_edge(task_0, task_1, deps);
  boost::add_edge(task_0, task_2, deps);
  boost::add_edge(task_0, task_3, deps);
  boost::add_edge(task_1, task_4, deps);
  boost::add_edge(task_1, task_5, deps);
  boost::add_edge(task_1, task_6, deps);
  boost::add_edge(task_2, task_7, deps);
  boost::add_edge(task_2, task_8, deps);
  boost::add_edge(task_2, task_9, deps);

  std::vector tasks = 
    {
      std::bind(basic_task, 0, 1000),
      std::bind(basic_task, 1, 1000),
      std::bind(basic_task, 2, 1000),
      std::bind(basic_task, 3, 1000),
      std::bind(basic_task, 4, 1000),
      std::bind(basic_task, 5, 1000),
      std::bind(basic_task, 6, 1000),
      std::bind(basic_task, 7, 1000),
      std::bind(basic_task, 8, 1000),
      std::bind(basic_task, 9, 1000)
    };

  scheduler *s = new scheduler(std::move(deps), std::move(tasks));
  s->doit();

  return 0;
}


scheduler.h:

```
#ifndef __SCHEDULER2_H__
#define __SCHEDULER2_H__

#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#in

Solution

Here are some things that may help you improve your code.

Be careful with signed and unsigned

In the doit routine and various others, the code compares an int i to a size_t num_tasks, but size_t is unsigned and int is signed. Instead, declare both variables as size_t types. Also, see the next suggestion.

Be careful with namespace assumptions

The C++ standard says that size_t is defined in the std namespace. It might also be defined as ::size_t (and often is) but it's not guaranteed. Better would be to use std::size_t explicitly.

Avoid using namespace in headers

The scheduler.h header includes this line:

using namespace boost;


This may seem convenient, but it means that any code that includes this header has now brought everything from the boost namespace into the global namespace which is a recipe for annoying name clashes later. If, for example, we used both your header and code like this:

#include "scheduler_driver.hpp"
#include 
#include 
void clash() {
    using namespace std;
    array, 5> m;
}


it would fail to compile because of the name clash on array. In this artificial example, it's easy to spot, but in real code it's a headache that will have a future programmer cursing your name. It's easy to fix -- just remove that line and add the boost:: namespace prefix explicitly where needed.

Avoid data races

The basic_task function accesses std::cout but that's a single resource that might simultaneously be used by other threads. One fix for this is to use a std::mutex like this:

static std::mutex cout_mutex;

// wherever cout is used:
some_function() {
    std::lock_guard lock(cout_mutex);
    std::cout << "Now we can do this safely!\n";
}


Note that std::lock_guard is intended to be used via RAII so that the lock is obtained when the object is created and released when it's destroyed, making it easy to avoid the bug of forgetting to unlock the mutex.

I know this is just test code, but if the actual function called within your code uses shared resources, you should apply the same technique.

Consider exceptions

If valid() is not true when std::future::get() is called, the behavior is undefined. One way that can happen is if the future throws an exception before the computation of the result is complete. That's not going to happen with this toy code, but if you're going to apply this scheduler to real tasks, that should be taken into account.

Consider an alternative representation

The graph-based mechanism you current have seems to work, but I wonder if it might not be made simpler. In particular, one thing that comes to mind is the use of a std::priority_queue in which the priority is simply the depth of that node in the dependency tree. The difference is that although it would mean that none of the leaf tasks would get executed until all of the higher priority tasks were executed, the priority queue has the benefit of constant lookup time. Might not be a big difference with this code, (yours is linear) but it might be worth considering if you have a very large number of tasks.

"Don't throw away your future"

OK, that sounds like it should be fatherly advice on life, but it's actually a lot more minor than that. Within the task_thread() routine, the value of the future.get() is assigned to val but never used.

Code Snippets

using namespace boost;
#include "scheduler_driver.hpp"
#include <array>
#include <boost/array.hpp>
void clash() {
    using namespace std;
    array<array<int, 5>, 5> m;
}
static std::mutex cout_mutex;

// wherever cout is used:
some_function() {
    std::lock_guard<std::mutex> lock(cout_mutex);
    std::cout << "Now we can do this safely!\n";
}

Context

StackExchange Code Review Q#122176, answer score: 4

Revisions (0)

No revisions yet.