patternjavaMinor
Summing random numbers in parallel
Viewed 0 times
summingrandomnumbersparallel
Problem
I am playing around with parallelization in Java and am trying to see how to squeeze out more performance on my multi-core box. My machine has 6 physical cores, 12 with hyperthreading.
However, with this code, I only see performance improvements up to about 3-4 threads, after which the improvements top off and then decline. I would expect performance to decline beyond, let's say, 6-8 threads, but not before. What, if anything, could I do to improve the code, especially with respect to improving speed?
```
import java.util.HashMap;
import java.util.Map;
import java.util.Random;
public class AddStuff
{
private static final int FIFTY_MILLION = 50000000;
public static void main(String[] args)
{
final int numRecords = FIFTY_MILLION;
final int numIterations = 100;
final int maxThreads = 20;
double[] numbers = new double[numRecords];
Random r = new Random(1);
for (int i = 0; i map = new HashMap();
for (int i = 0; i < numThreads; i++)
{
final Calculator c = new Calculator(numbers, i, numThreads);
Thread t = new Thread(new Runnable() {
@Override
public void run() {
c.calculate();
}
});
map.put(t, c);
t.start();
}
for (Thread t : map.keySet())
{
try {
t.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
double total = 0;
for (Calculator c : map.values())
total += c.total;
return total;
}
private static class Calculator
{
private final double[] numbers;
private final int start;
private final int step;
private volatile double total;
public Calculator(double[] numbers, int start, int step)
{
this.numbers = numbers;
this.start = start;
this.step = step;
}
void calculate()
{
double myTotal = 0;
However, with this code, I only see performance improvements up to about 3-4 threads, after which the improvements top off and then decline. I would expect performance to decline beyond, let's say, 6-8 threads, but not before. What, if anything, could I do to improve the code, especially with respect to improving speed?
```
import java.util.HashMap;
import java.util.Map;
import java.util.Random;
public class AddStuff
{
private static final int FIFTY_MILLION = 50000000;
public static void main(String[] args)
{
final int numRecords = FIFTY_MILLION;
final int numIterations = 100;
final int maxThreads = 20;
double[] numbers = new double[numRecords];
Random r = new Random(1);
for (int i = 0; i map = new HashMap();
for (int i = 0; i < numThreads; i++)
{
final Calculator c = new Calculator(numbers, i, numThreads);
Thread t = new Thread(new Runnable() {
@Override
public void run() {
c.calculate();
}
});
map.put(t, c);
t.start();
}
for (Thread t : map.keySet())
{
try {
t.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
double total = 0;
for (Calculator c : map.values())
total += c.total;
return total;
}
private static class Calculator
{
private final double[] numbers;
private final int start;
private final int step;
private volatile double total;
public Calculator(double[] numbers, int start, int step)
{
this.numbers = numbers;
this.start = start;
this.step = step;
}
void calculate()
{
double myTotal = 0;
Solution
Bad order for cache
Your calculation routine is adding up every
Modified code
I made the following changes to your code:
First, I modified your
Then I modified the
The results
These are the results on my machine before the change (4 cores, no hyperthreads, only 10 million doubles instead of 50 million):
And these are the results after the cache friendly modifications:
Your calculation routine is adding up every
nth number, where n is the number of threads. This is very bad for cache utilization. If you split the list so that each thread had a contiguous chunk instead, that would be a lot better for cache utilization and make everything faster.Modified code
I made the following changes to your code:
First, I modified your
Calculator class to take a start and end argument instead of a start and step argument:private static class Calculator
{
private final double[] numbers;
private final int start;
private final int end;
private double total;
public Calculator(double[] numbers, int start, int end)
{
this.numbers = numbers;
this.start = start;
this.end = end;
}
void calculate()
{
double myTotal = 0;
for (int i = start; i < end; i++)
myTotal += numbers[i];
total = myTotal;
}
}Then I modified the
computeTotal() function to split the numbers into evenly sized contiguous chunks:private static double computeTotal(double[] numbers, int numThreads)
{
Map map = new HashMap();
int chunkSize = (numbers.length / numThreads);
for (int i = 0; i < numThreads; i++)
{
int start = i * chunkSize;
int end = (i == numThreads - 1) ? numbers.length :
(i+1) * chunkSize;
final Calculator c = new Calculator(numbers, start, end);The results
These are the results on my machine before the change (4 cores, no hyperthreads, only 10 million doubles instead of 50 million):
Running aggregation of 10000000 times 100 iterations...
Threads: 1, Time: 1.2901s, Total: 499890086.941070
Threads: 2, Time: 0.7689s, Total: 499890086.941129
Threads: 3, Time: 0.6693s, Total: 499890086.941116
Threads: 4, Time: 0.6586s, Total: 499890086.941095
Threads: 5, Time: 1.1863s, Total: 499890086.941099
Threads: 6, Time: 1.1560s, Total: 499890086.941105
Threads: 7, Time: 1.1590s, Total: 499890086.941105
Threads: 8, Time: 1.1804s, Total: 499890086.941100
Threads: 9, Time: 1.6860s, Total: 499890086.941106
Threads: 10, Time: 1.7471s, Total: 499890086.941105
And these are the results after the cache friendly modifications:
Running aggregation of 10000000 times 100 iterations...
Threads: 1, Time: 1.2738s, Total: 499890086.941070
Threads: 2, Time: 0.6965s, Total: 499890086.941110
Threads: 3, Time: 0.4985s, Total: 499890086.941099
Threads: 4, Time: 0.4814s, Total: 499890086.941099
Threads: 5, Time: 0.6425s, Total: 499890086.941099
Threads: 6, Time: 0.5468s, Total: 499890086.941105
Threads: 7, Time: 0.5108s, Total: 499890086.941106
Threads: 8, Time: 0.5244s, Total: 499890086.941099
Threads: 9, Time: 0.5870s, Total: 499890086.941105
Threads: 10, Time: 0.5293s, Total: 499890086.941099
Code Snippets
private static class Calculator
{
private final double[] numbers;
private final int start;
private final int end;
private double total;
public Calculator(double[] numbers, int start, int end)
{
this.numbers = numbers;
this.start = start;
this.end = end;
}
void calculate()
{
double myTotal = 0;
for (int i = start; i < end; i++)
myTotal += numbers[i];
total = myTotal;
}
}private static double computeTotal(double[] numbers, int numThreads)
{
Map<Thread, Calculator> map = new HashMap<Thread, Calculator>();
int chunkSize = (numbers.length / numThreads);
for (int i = 0; i < numThreads; i++)
{
int start = i * chunkSize;
int end = (i == numThreads - 1) ? numbers.length :
(i+1) * chunkSize;
final Calculator c = new Calculator(numbers, start, end);Context
StackExchange Code Review Q#109010, answer score: 5
Revisions (0)
No revisions yet.