patterncppMinor

Speed concerns of octree implementation

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

codereview cpp stackoverflow c++algorithm tree

implementationspeedoctreeconcerns

Problem

For couple of days now I am trying to speed up my Force-Directed graph implementation. So far I've implemented Barnes-Hut algorithm that's using octree to decrease number of computations. I've tested it multiple times and number of force related computations is indeed drastically decreased. Below is the plot of computations to number of nodes without Barns-Hut (blue line) and with (red line):

Even though now it should be a lot faster, the truth is that in the matter of speed (time) the upgrade is only few percent.

One part I suppose might be causing this is tree creation and elements in tree placing. Because elements are constantly moving I need to recreate tree each loop until some stop condition is reached. But if I will spend lot of time creating tree I will lost there time I've gained on force computation increase. That's my thinking at least. This is how I am adding elements in my main file loop:

```
void AddTreeElements(Octree tree, glm::vec3 boundries, Graph& graph)
{
for(auto& node:graph.NodeVector())
{
node.parent_group = nullptr;
if(node.pos[0] boundries[0][0] &&
node.pos[1] > boundries[4][1] && node.pos[1] boundries[3][2])
{
tree->AddObject(&node.second);
continue;
}

if(node.pos[0] boundries[1][0])
{
boundries[1][0] = node.pos[0]+1.0f;
boundries[2][0] = node.pos[0]+1.0f;
boundries[5][0] = node.pos[0]+1.0f;
boundries[6][0] = node.pos[0]+1.0f;
}

if(node.pos[1] boundries[0][1])
{
boundries[0][1] = node.pos[1]+1.0f;
boundries[1][1] = node.pos[1]+1.0f;
boundries[2][1] = node.pos[1]+1.0f;
boundries[3][1] = node.pos[1]+1.0f;
}

if(node.pos[2] boundries[0][2])
{
boundries[0][2] = node.pos[2]+1.0f;
boundries[1][2] = node.pos[2]+1.0f;
boundries[4][2] = node.pos[2]+1.0f;

Solution

My first reaction is your dynamic memory allocation:

trees[i] = new Quadtree(this, child_bounds, level+1, maxLevel);

You are better off popping those objects from a pre-made stack the size of your maximum expected Quadtree object count. Allocating memory is very slow.

My second reaction is your use of std::vector. It may allocate/deallocate big blocks of memory when you push (and pop?). Your usage pattern resembles that of std::list which may serve you better, at least getting close to constant insertion/deletion properties. As for performance the std lib is the fastest there is at doing what it does, but you may want more control over the memory allocation/deallocation part. Do what you wish.

If your parameters are guaranteed to be non-NULL you might want to send them as references instead of pointers. It may or may not improve performance. The compiler may have it easier optimizing your code. And besides you rarely manage cases where pointers may be NULL.

Your contains function should be able to share calculations and results between nodes as the nodes share bounding planes with each other. As it is right now you test 6 planes in best case and 40 planes worst case. By sharing planes you reduce that to a constant 9 for all cases. Probably no big deal.

Always pre-increment loop counters. This will probably have no noticeable effect with decent compiler optimization but it is good coding practice.

A bit off-topic maybe but you may want to look at other platforms to implement the algorithm on such as OpenCL, even if targeting CPU only environments.

Code Snippets

trees[i] = new Quadtree(this, child_bounds, level+1, maxLevel);

Context

StackExchange Code Review Q#127693, answer score: 2

Revisions (0)

No revisions yet.