HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Searching through a huge set with threads

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
searchingwiththreadshugethroughset

Problem

I wrote a class that finds values in a huge search volume. I have a pretty strong feeling that my code is not good and would like to get your opinion on it. My experience with C++11 threading is very limited and I think I use the wrong features for my task.

#include 
#include 
#include 
#include 

class Searcher
{
public:
    std::future Find(int x) {
        m_searches.emplace_back();
        m_searches.rbegin()->first = x;
        return m_searches.rbegin()->second.get_future();
    }

    void Search()
    {
        // very large search data
        static const std::vector haystack = { 4, 5, 1, 3, 9, 8, 7, 6, 11, 42, 13 };

        std::vector foundIndices;

        // find values
        for (auto h : haystack) {
            for (size_t i = 0; i >> m_searches;
};

int main()
{
    Searcher s;
    auto s1 = s.Find(1);
    auto s2 = s.Find(2);
    auto s3 = s.Find(3);

    s.Search();

    std::cout << s1.get() << std::endl;
    std::cout << s2.get() << std::endl;
    std::cout << s3.get() << std::endl;

    return 0;
}


There should be one search running for many Finds, because the set that is searched is very large. It could however run concurrently to split the search up to multiple threads. For example with two threads: The first thread searches the first half, the second thread searches the second half.

The futures are not really guaranteed to be valid, because the user has to call the Search function. Would it be possible to start the search, once the first .get was called?

My "algorithm" to fulfil the promises looks very ugly and complicated, but I can't think of a better way.

Solution

There are definitely some problems if you want to make this threaded. Firstly, access to m_searches is not protected in any way. Since this is a shared bit of memory, it would need to be protected by something (say a mutex) before you modify it.

Generally in situations like these, it's far better to let std::async do a lot of the hard work for you. Also, try and eliminate shared mutable state as much as possible, because it makes life a lot harder. Firstly, let's write a simple search function that'll return an index if we find a match:

const std::size_t not_found = -1;

template 
std::size_t find_index(Iterator begin, Iterator end, 
                       const T& value, std::size_t start)
{
    std::size_t s = start;
    Iterator it = begin;

    while(it != end) {
        if(*it == value) {
            return s;
        }
        ++s;
        ++it;
    }

    return not_found;
}


Ok, this is just our normal linear search, but it allows us to start from anywhere in our search range since we're only passing in a few iterators.

Next, we define a vector to search over:

std::vector v { 4, 5, 1, 3, 9, 8, 7, 6, 11, 42, 13 };
typedef typename std::vector::iterator iter_t;


Create our vector of future to hold results:

std::vector> futures;


And then populate it:

futures.emplace_back(
    std::async(std::launch::async, find_index, 
               v.begin(), v.begin() + 6, 8, 0));
futures.emplace_back(
    std::async(std::launch::async, find_index, 
               v.begin() + 6, v.end(), 8, 6));


Note the std::launch::async flag specifies we want to create a new thread to run our function in. I've hardcoded the search start and end places here, but you can obviously add some logic to split things up evenly if you'd like. Also, note that we need to explicitly define the template types here, since they won't be deduced.

Now we'll set up a vector to store our results in, and get the results back:

std::vector results;
for(auto& future : futures) {
    results.emplace_back(future.get());
}


Finally, print out anything that was found:

std::for_each(results.begin(), results.end(),
              [](std::size_t index)
              { if(index != not_found) std::cout << index << "\n"; });
}


The whole thing can be found here.

Hopefully this gives you an idea of a different way of approaching your problem.

Code Snippets

const std::size_t not_found = -1;

template <typename Iterator, typename T>
std::size_t find_index(Iterator begin, Iterator end, 
                       const T& value, std::size_t start)
{
    std::size_t s = start;
    Iterator it = begin;

    while(it != end) {
        if(*it == value) {
            return s;
        }
        ++s;
        ++it;
    }

    return not_found;
}
std::vector<int> v { 4, 5, 1, 3, 9, 8, 7, 6, 11, 42, 13 };
typedef typename std::vector<int>::iterator iter_t;
std::vector<std::future<std::size_t>> futures;
futures.emplace_back(
    std::async(std::launch::async, find_index<iter_t, int>, 
               v.begin(), v.begin() + 6, 8, 0));
futures.emplace_back(
    std::async(std::launch::async, find_index<iter_t, int>, 
               v.begin() + 6, v.end(), 8, 6));
std::vector<std::size_t> results;
for(auto& future : futures) {
    results.emplace_back(future.get());
}

Context

StackExchange Code Review Q#29964, answer score: 3

Revisions (0)

No revisions yet.