patterncppMinor
Lock-free cache oblivious n-body algorithm
Viewed 0 times
freeobliviousalgorithmcachebodylock
Problem
I'm currently looking at, from a rather high level, the parallelization of the gravity calculation in an N-body simulation for approximating a solution to the N-body problem.
The simple form of the algorithm looks something like this:
This simple version has an obvious parallelization over the outer loop:
However, you're doing twice as many units of work as necessary. It's the case that gravity acting on body i from body j is the same as the gravity acting on body j from body i, but with the opposite sign.
You can calculate gravity pairwise instead:
This is the same exact calculation but you're making use of the fact that the force is symmetric, but with a sign flip.
But now parallelizing this loop becomes much harder. But if you treat this problem geometrically, then you can realize that you can actually divide up the work in a special way to never require locks.
First off, you can calculate pairwise forces for the first n / 2 bodies. This is inherently serial. Then, you can calculate the pairwise forces between the bodies [0, n / 2] and [n / 2, n]. Lastly you calculate the pairwise forces for the last n / 2 bodies.
The algorithm then looks like:
```
void BlockGravity(Body *bodies, int i1, int i2, int j1, int j2)
{
for (int i = i1; i < i2; i++)
for (int j = j1; j < j2; j++)
bodies[i].Pairwise(bodies[j]);
}
void TriangleGravity(Body *bodies, int i1, int i2)
{
for (int i = i1; i < i2; i++)
for (int j = i1 + 1; j < i2; j++)
bo
The simple form of the algorithm looks something like this:
for (int i = 0; i Pos;
double r2 = Dot(dr, dr);
double ir3 = 1 / (r2 * sqrt(r2));
this->Acc += (other.Mass * ir3) * dr;
}This simple version has an obvious parallelization over the outer loop:
#pragma omp parallel for
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
if (i != j)
bodies[i].ApplyGravityFrom(bodies[j]);However, you're doing twice as many units of work as necessary. It's the case that gravity acting on body i from body j is the same as the gravity acting on body j from body i, but with the opposite sign.
You can calculate gravity pairwise instead:
for (int i = 0; i Pos;
double r2 = Dot(dr, dr);
double ir3 = 1 / (r2 * sqrt(r2));
this->Acc += (other.Mass * ir3) * dr;
other.Acc -= (this->Mass * ir3) * dr;
}This is the same exact calculation but you're making use of the fact that the force is symmetric, but with a sign flip.
But now parallelizing this loop becomes much harder. But if you treat this problem geometrically, then you can realize that you can actually divide up the work in a special way to never require locks.
First off, you can calculate pairwise forces for the first n / 2 bodies. This is inherently serial. Then, you can calculate the pairwise forces between the bodies [0, n / 2] and [n / 2, n]. Lastly you calculate the pairwise forces for the last n / 2 bodies.
The algorithm then looks like:
```
void BlockGravity(Body *bodies, int i1, int i2, int j1, int j2)
{
for (int i = i1; i < i2; i++)
for (int j = j1; j < j2; j++)
bodies[i].Pairwise(bodies[j]);
}
void TriangleGravity(Body *bodies, int i1, int i2)
{
for (int i = i1; i < i2; i++)
for (int j = i1 + 1; j < i2; j++)
bo
Solution
You seem to be making the problem more complex:
I would take a step back to your original algorithm:
Your problem (as you stated) that the parallelism is uneven (the lower outer loops is doing more work than the higher values of the outer loop).
But why not combine the two loops.
Each loop is completely independent.
All you have to do is calculate how the functions src() and dst() work.
I would take a step back to your original algorithm:
#pragma omp parallel for
for (int i = 0; i < n; i++)
for (int j = i + 1; j < n; j++)
bodies[i].PairwiseGravity(bodies[j]);Your problem (as you stated) that the parallelism is uneven (the lower outer loops is doing more work than the higher values of the outer loop).
But why not combine the two loops.
#pragma omp parallel for
for (int i = 0; i < ((n*(n-1))/2; i++)
bodies[src(i,n)].PairwiseGravity(bodies[dst(i,n)]);Each loop is completely independent.
All you have to do is calculate how the functions src() and dst() work.
Code Snippets
#pragma omp parallel for
for (int i = 0; i < n; i++)
for (int j = i + 1; j < n; j++)
bodies[i].PairwiseGravity(bodies[j]);#pragma omp parallel for
for (int i = 0; i < ((n*(n-1))/2; i++)
bodies[src(i,n)].PairwiseGravity(bodies[dst(i,n)]);Context
StackExchange Code Review Q#7578, answer score: 2
Revisions (0)
No revisions yet.