HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Summing random numbers in parallel

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
summingrandomnumbersparallel

Problem

I am playing around with parallelization in Java and am trying to see how to squeeze out more performance on my multi-core box. My machine has 6 physical cores, 12 with hyperthreading.

However, with this code, I only see performance improvements up to about 3-4 threads, after which the improvements top off and then decline. I would expect performance to decline beyond, let's say, 6-8 threads, but not before. What, if anything, could I do to improve the code, especially with respect to improving speed?

```
import java.util.HashMap;
import java.util.Map;
import java.util.Random;

public class AddStuff
{
private static final int FIFTY_MILLION = 50000000;

public static void main(String[] args)
{
final int numRecords = FIFTY_MILLION;
final int numIterations = 100;
final int maxThreads = 20;

double[] numbers = new double[numRecords];
Random r = new Random(1);

for (int i = 0; i map = new HashMap();

for (int i = 0; i < numThreads; i++)
{
final Calculator c = new Calculator(numbers, i, numThreads);

Thread t = new Thread(new Runnable() {
@Override
public void run() {
c.calculate();
}
});

map.put(t, c);
t.start();
}

for (Thread t : map.keySet())
{
try {
t.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}

double total = 0;
for (Calculator c : map.values())
total += c.total;

return total;
}

private static class Calculator
{
private final double[] numbers;
private final int start;
private final int step;
private volatile double total;

public Calculator(double[] numbers, int start, int step)
{
this.numbers = numbers;
this.start = start;
this.step = step;
}

void calculate()
{
double myTotal = 0;

Solution

Bad order for cache

Your calculation routine is adding up every nth number, where n is the number of threads. This is very bad for cache utilization. If you split the list so that each thread had a contiguous chunk instead, that would be a lot better for cache utilization and make everything faster.

Modified code

I made the following changes to your code:

First, I modified your Calculator class to take a start and end argument instead of a start and step argument:

private static class Calculator
{
    private final double[] numbers;
    private final int start;
    private final int end;
    private double total;

    public Calculator(double[] numbers, int start, int end)
    {
        this.numbers = numbers;
        this.start = start;
        this.end = end;
    }

    void calculate()
    {
        double myTotal = 0;
        for (int i = start; i < end; i++)
            myTotal += numbers[i];

        total = myTotal;
    }
}


Then I modified the computeTotal() function to split the numbers into evenly sized contiguous chunks:

private static double computeTotal(double[] numbers, int numThreads)
{
    Map map = new HashMap();
    int chunkSize = (numbers.length / numThreads);

    for (int i = 0; i < numThreads; i++)
    {
        int start = i * chunkSize;
        int end   = (i == numThreads - 1) ? numbers.length :
                                            (i+1) * chunkSize;
        final Calculator c = new Calculator(numbers, start, end);


The results

These are the results on my machine before the change (4 cores, no hyperthreads, only 10 million doubles instead of 50 million):

Running aggregation of 10000000 times 100 iterations...
Threads: 1, Time: 1.2901s, Total: 499890086.941070
Threads: 2, Time: 0.7689s, Total: 499890086.941129
Threads: 3, Time: 0.6693s, Total: 499890086.941116
Threads: 4, Time: 0.6586s, Total: 499890086.941095
Threads: 5, Time: 1.1863s, Total: 499890086.941099
Threads: 6, Time: 1.1560s, Total: 499890086.941105
Threads: 7, Time: 1.1590s, Total: 499890086.941105
Threads: 8, Time: 1.1804s, Total: 499890086.941100
Threads: 9, Time: 1.6860s, Total: 499890086.941106
Threads: 10, Time: 1.7471s, Total: 499890086.941105


And these are the results after the cache friendly modifications:

Running aggregation of 10000000 times 100 iterations...
Threads: 1, Time: 1.2738s, Total: 499890086.941070
Threads: 2, Time: 0.6965s, Total: 499890086.941110
Threads: 3, Time: 0.4985s, Total: 499890086.941099
Threads: 4, Time: 0.4814s, Total: 499890086.941099
Threads: 5, Time: 0.6425s, Total: 499890086.941099
Threads: 6, Time: 0.5468s, Total: 499890086.941105
Threads: 7, Time: 0.5108s, Total: 499890086.941106
Threads: 8, Time: 0.5244s, Total: 499890086.941099
Threads: 9, Time: 0.5870s, Total: 499890086.941105
Threads: 10, Time: 0.5293s, Total: 499890086.941099

Code Snippets

private static class Calculator
{
    private final double[] numbers;
    private final int start;
    private final int end;
    private double total;

    public Calculator(double[] numbers, int start, int end)
    {
        this.numbers = numbers;
        this.start = start;
        this.end = end;
    }

    void calculate()
    {
        double myTotal = 0;
        for (int i = start; i < end; i++)
            myTotal += numbers[i];

        total = myTotal;
    }
}
private static double computeTotal(double[] numbers, int numThreads)
{
    Map<Thread, Calculator> map = new HashMap<Thread, Calculator>();
    int chunkSize = (numbers.length / numThreads);

    for (int i = 0; i < numThreads; i++)
    {
        int start = i * chunkSize;
        int end   = (i == numThreads - 1) ? numbers.length :
                                            (i+1) * chunkSize;
        final Calculator c = new Calculator(numbers, start, end);

Context

StackExchange Code Review Q#109010, answer score: 5

Revisions (0)

No revisions yet.