patternjavaMinor
Optimization of Barnes-Hut Multithread Insertion algorithm
Viewed 0 times
insertionhutoptimizationalgorithmmultithreadbarnes
Problem
I'm currently working on optimizing a Barnes-Hut implementation (connected to a previous post I did) and I need help to optimize it further.
After some testing, the main problem appears to be in the Insertion algorithm, which takes roughly 3-10 times as long as the second longest part of the program.
The basis of the program are a number of threads that each act as a worker that does the following when building the tree:
the getIndex function can be found in the BarnesHut class which handles the start of the program and the insertion algorithm. It's synchronized to work as a
Now, the insertion algorithm looks like this:
```
//entryLock and quadLock are two atomic mutex locks
//childs[][] are the leaves to the tree structur that makes up a Barnes-Hut
public void insertThreaded(BHBody body, int threadID){
entryLock.lock();
numberOfParticles++;
if(numberOfParticles == 1){
existingParticle = body;
entryLock.unlock();
}
else if(numberOfParticles == 2){
quadLock.lock();
entryLock.unlock();
int[] position = getQuadForBody(existingParticle);
if(childs[position[0]][position[1]] == null){
startQuad(position[0],position[1]);
}
quadLock.unlock();
childs[ position[0] ][ position[1] ].insertThreaded(existingParticle, threadID);
quadLock.lock();
position = getQuadForBody(body);
if(childs[position[0]][position[1]] == null){
startQuad(position[0],position[1]);
}
quadLock.unlock();
childs[ position[0] ][ position[1] ].insertThreaded(body, threadID);
}
else if(numberOfPar
After some testing, the main problem appears to be in the Insertion algorithm, which takes roughly 3-10 times as long as the second longest part of the program.
The basis of the program are a number of threads that each act as a worker that does the following when building the tree:
//particle[] is a list that is set in the constructor
public void run(){
boolean run = true;
while(run){
i = barnesHut.getIndex();
if(i < particle.length){ tree.insertThreaded(particle[i], thread); }
else{ run = false; }
}
}the getIndex function can be found in the BarnesHut class which handles the start of the program and the insertion algorithm. It's synchronized to work as a
public synchronized int getIndex(){
return index++;
}Now, the insertion algorithm looks like this:
```
//entryLock and quadLock are two atomic mutex locks
//childs[][] are the leaves to the tree structur that makes up a Barnes-Hut
public void insertThreaded(BHBody body, int threadID){
entryLock.lock();
numberOfParticles++;
if(numberOfParticles == 1){
existingParticle = body;
entryLock.unlock();
}
else if(numberOfParticles == 2){
quadLock.lock();
entryLock.unlock();
int[] position = getQuadForBody(existingParticle);
if(childs[position[0]][position[1]] == null){
startQuad(position[0],position[1]);
}
quadLock.unlock();
childs[ position[0] ][ position[1] ].insertThreaded(existingParticle, threadID);
quadLock.lock();
position = getQuadForBody(body);
if(childs[position[0]][position[1]] == null){
startQuad(position[0],position[1]);
}
quadLock.unlock();
childs[ position[0] ][ position[1] ].insertThreaded(body, threadID);
}
else if(numberOfPar
Solution
You should know, what's the bottleneck. Do you have two locks altogether? This doesn't sound good, as each part of the tree can be worked independently on. This means you could run on hundreds of CPUs efficiently.
The operations you do under lock are rather simple, which means that the overhead of locking can be significant. Can't you separate the three into a few parts and assign one CPU to each part?
Now some boring general comments:
Where is
That's what
Use better spacing,
The operations you do under lock are rather simple, which means that the overhead of locking can be significant. Can't you separate the three into a few parts and assign one CPU to each part?
Now some boring general comments:
boolean run = true;
while(run){
i = barnesHut.getIndex();
if(i < particle.length){ tree.insertThreaded(particle[i], thread); }
else{ run = false; }
}Where is
i defined? Anyway, unobfuscate towhile (true) {
int i = barnesHut.getIndex();
if (i >= particle.length) break;
tree.insertThreaded(particle[i], thread);
}public synchronized int getIndex(){
return index++;
}That's what
AtomicInt#incrementAndGet() is for.Use better spacing,
if (i < particle.length) { instead of if(i < particle.length){.Code Snippets
boolean run = true;
while(run){
i = barnesHut.getIndex();
if(i < particle.length){ tree.insertThreaded(particle[i], thread); }
else{ run = false; }
}while (true) {
int i = barnesHut.getIndex();
if (i >= particle.length) break;
tree.insertThreaded(particle[i], thread);
}public synchronized int getIndex(){
return index++;
}Context
StackExchange Code Review Q#62536, answer score: 2
Revisions (0)
No revisions yet.