HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Parallelization of number factors using OpenMP

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
factorsnumberopenmpparallelizationusing

Problem

For a simple try at parallelization on my own outside of school, I've created a number factors calculator. I hope to eventually come up with something more creative.

Since I don't have access to parallel computers at this time, I'm using OpenMP provided by my compiler (gcc 4.8.1) and running it on my laptop (Intel Core i3-2330M). I'm using a maximum of four threads, which was determined from a call to omp_get_max_threads().

I've conducted four runs, each with four billion values and from one to four threads:

#include 
#include 
#include 
#include 
#include 
#include 
#include 

void displayCompTime(std::clock_t start, std::clock_t end, std::int64_t integer, int threads)
{
    double elapsed = static_cast(end - start) / CLOCKS_PER_SEC;

    std::cout  factors;
    std::int64_t i;

    #pragma omp parallel for num_threads(threads) default(none) \
        shared(factors, integer), private(i)
    for (i = 2; i <= integer; i++)
    {
        if (integer % i == 0)
        {
            factors[i] = integer / i;
        }
    }
}

int main()
{
    const std::int64_t integer = 4000000000;
    const int runs = 4;

    for (int i = 0; i < runs; i++)
    {
        std::clock_t start = std::clock();
        int threads = i + 1;
        calcFactors(integer, threads);
        std::clock_t end = std::clock();
        displayCompTime(start, end, integer, threads);
    }
}


Output:

4000000000 values and 1 thread(s): 67.7330s
4000000000 values and 2 thread(s): 40.7640s
4000000000 values and 3 thread(s): 32.5630s
4000000000 values and 4 thread(s): 29.7640s


Based on these results, this code doesn't appear to scale very well. I don't know if using a non-default static schedule would give faster times, and anything else would just incur additional overhead. Fortunately, I didn't need to include atomic or critical.

Would avoiding a lot of division help? I didn't try for anything else yet as this is only a start. I also wanted to see how well my laptop could handle

Solution

I see that OpenMP's rules will make things a little difficult here. For instance, I won't be able to concisely set i to either 2 or 3, depending on integer's parity. This is because OpenMP requires the loop counter to be set within the loop, though it can still be declared beforehand. I would otherwise have to put a ternary within the loop statement, which would look ugly. It could save one iteration, which may not make a huge difference.

As such, I might as well initialize i within the loop statement and then remove the private part from the preprocessor directive:

shared(factors, integer)
for (std::int64_t i = 2; i <= integer; i++)


Performance-wise, I was able to get a small boost by initializing factors outside of the timed section and passing it to calcFactors(). Regardless of the time it usually takes to initialize an std::map, my runtime will always be limited by that as it's part of the serial code.

With the same division and modulus operations, I would likely not get any significant performance boost. Regardless of the thread count, the number of modulus and division operations still vary, which may explain the poor scalability over time. I also cannot test with higher thread counts due to my machine's limited number of available threads.

Code Snippets

shared(factors, integer)
for (std::int64_t i = 2; i <= integer; i++)

Context

StackExchange Code Review Q#73868, answer score: 8

Revisions (0)

No revisions yet.