patterncppMinor
Parallelization of number factors using OpenMP
Viewed 0 times
factorsnumberopenmpparallelizationusing
Problem
For a simple try at parallelization on my own outside of school, I've created a number factors calculator. I hope to eventually come up with something more creative.
Since I don't have access to parallel computers at this time, I'm using OpenMP provided by my compiler (gcc 4.8.1) and running it on my laptop (Intel Core i3-2330M). I'm using a maximum of four threads, which was determined from a call to
I've conducted four runs, each with four billion values and from one to four threads:
Output:
Based on these results, this code doesn't appear to scale very well. I don't know if using a non-default static schedule would give faster times, and anything else would just incur additional overhead. Fortunately, I didn't need to include atomic or critical.
Would avoiding a lot of division help? I didn't try for anything else yet as this is only a start. I also wanted to see how well my laptop could handle
Since I don't have access to parallel computers at this time, I'm using OpenMP provided by my compiler (gcc 4.8.1) and running it on my laptop (Intel Core i3-2330M). I'm using a maximum of four threads, which was determined from a call to
omp_get_max_threads().I've conducted four runs, each with four billion values and from one to four threads:
#include
#include
#include
#include
#include
#include
#include
void displayCompTime(std::clock_t start, std::clock_t end, std::int64_t integer, int threads)
{
double elapsed = static_cast(end - start) / CLOCKS_PER_SEC;
std::cout factors;
std::int64_t i;
#pragma omp parallel for num_threads(threads) default(none) \
shared(factors, integer), private(i)
for (i = 2; i <= integer; i++)
{
if (integer % i == 0)
{
factors[i] = integer / i;
}
}
}
int main()
{
const std::int64_t integer = 4000000000;
const int runs = 4;
for (int i = 0; i < runs; i++)
{
std::clock_t start = std::clock();
int threads = i + 1;
calcFactors(integer, threads);
std::clock_t end = std::clock();
displayCompTime(start, end, integer, threads);
}
}Output:
4000000000 values and 1 thread(s): 67.7330s
4000000000 values and 2 thread(s): 40.7640s
4000000000 values and 3 thread(s): 32.5630s
4000000000 values and 4 thread(s): 29.7640s
Based on these results, this code doesn't appear to scale very well. I don't know if using a non-default static schedule would give faster times, and anything else would just incur additional overhead. Fortunately, I didn't need to include atomic or critical.
Would avoiding a lot of division help? I didn't try for anything else yet as this is only a start. I also wanted to see how well my laptop could handle
Solution
I see that OpenMP's rules will make things a little difficult here. For instance, I won't be able to concisely set
As such, I might as well initialize
Performance-wise, I was able to get a small boost by initializing
With the same division and modulus operations, I would likely not get any significant performance boost. Regardless of the thread count, the number of modulus and division operations still vary, which may explain the poor scalability over time. I also cannot test with higher thread counts due to my machine's limited number of available threads.
i to either 2 or 3, depending on integer's parity. This is because OpenMP requires the loop counter to be set within the loop, though it can still be declared beforehand. I would otherwise have to put a ternary within the loop statement, which would look ugly. It could save one iteration, which may not make a huge difference.As such, I might as well initialize
i within the loop statement and then remove the private part from the preprocessor directive:shared(factors, integer)
for (std::int64_t i = 2; i <= integer; i++)Performance-wise, I was able to get a small boost by initializing
factors outside of the timed section and passing it to calcFactors(). Regardless of the time it usually takes to initialize an std::map, my runtime will always be limited by that as it's part of the serial code.With the same division and modulus operations, I would likely not get any significant performance boost. Regardless of the thread count, the number of modulus and division operations still vary, which may explain the poor scalability over time. I also cannot test with higher thread counts due to my machine's limited number of available threads.
Code Snippets
shared(factors, integer)
for (std::int64_t i = 2; i <= integer; i++)Context
StackExchange Code Review Q#73868, answer score: 8
Revisions (0)
No revisions yet.