patterncppMinor

Quadtree design

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

codereview cpp stackoverflow c++performance opengl tree

quadtreedesignstackoverflow

Problem

I have an application which is used for displaying and modifying huge volumes of point cloud data from lidar files (up to few gigabytes each, sometimes loaded in simultaneously). In the app the user is able to view a 2D image of loaded points (from the top) and select a profile to view in another window (from the side). Again this involves millions of points and they are displayed using OpenGL.

To handle the data there is also a quadtree library, which works, but is extremely slow. It has been used for some time, but recently the lidar point format changed and the LidarPoint object needed a number of attributes (class members) added, which cause it to grow in size in turn affecting the performance to almost unusable level (think 5 minutes to load a single 2GB file).

The quadtree currently consist of pointers to PointBucket objects which are simply arrays of LidarPoint objects with specified capacity and defined boundaries (for spatial queries). If the bucket capacity is exceeded it splits into four buckets. There is also kind of a caching system in place which causes point buckets to get dumped to disk when the point data is taking too much memory. These are then loaded back into memory if needed. Finally every PointBucket contains subbuckets/resolution levels which hold every n-th point of the original data and are used when displaying the data depending on the zoom level. That is because displaying few million points at once, while that level of detail is not necessary, is just extremely slow.

Here is the current (and slow) insert() method:

```
// Insert in QuadTree
bool QuadtreeNode::insert(LidarPoint newPoint)
{
// if the point dosen't belong in this subset of the tree return false
if (newPoint.getX() maxX_ ||
newPoint.getY() maxY_)
{
return false;
}
else
{
// if the node has overflowed and is a leaf
if ((numberOfPoints_ + 1) > capacity_ && leaf_ == true)
{
splitNode();

// inser

Solution

As you have not provided any profiling data I can only speculate in what I see in the code you have provided.
Guesstimating expected performance

First of lets consider some numbers, you say millions of points so I'm going to assume 10 million points.

You say 2GB file then for 10 million points, that's around 200 bytes per point sounds reasonable.

Lets assume a typical HDD with read speed somewhere around 80MB/s, then reading 2GB of data should take 2000/80 = 25 seconds or so.

Lets assume a modern 3GHz CPU with average instructions per cycle (IPC) of 5 (which is a bit pessimistic). That means that you can do around 3000*5/10 = 1500 instructions per point, per second per core. I would reasonably expect to be able to build the tree (ignoring the delay to read from disk) in a few, maybe tens of seconds.

All in all to read the entire thing into memory and build a quadtree, I would expect around a minute. You say 5 minutes which is a bit slower than what should be possible.
Looking at the code

I see no major blunders that would be an obvious culprit but I do see a LOT of branching on data.

One reason CPUs can have such high IPC is that they do something called "Speculative Execution". When the CPU encounters a branch (if-else) it will take an educated guess using sophisticated algorithms to try to predict which branch will be taken and start executing that branch before it has actually computed the branching condition. Pretty neat stuff. As long as this branch prediction goes well (and it does so quite frequently) you gain a lot of performance, but if the CPU on the other hand miss-predicted the branch, it has to back out of the work it already did and then go back and execute the correct branch.

This is called, "branch prediction failure" and is very well illustrated in this SO question.

In that question the OP has an array of random characters and sums all values larger than 128 (i.e. 50% of the values). Then the op does additional work and sorts the array first, and then runs the exactly same code. Intuitively you would expect it to be slower because you have done an additional sort before hand. But this is not the case, doing the extra work of the sorting actually makes the entire code 5x faster. This 5x speedup is entirely due to avoiding branch miss-predictions as the data is now sorted and the branch predictor in the CPU can correctly predict almost all branches while the random array is impossible to predict.

The 5x speedup factor uncannily matches my guesstimate above (1min vs 5min) and your code appears to be branching on random data (as the points are not sorted geometrically, whatever that means).

I believe that you can gain some significant performance by rethinking how you branch in your insert method.

For example here:

// if the point dosen't belong in this subset of the tree return false
if (newPoint.getX()  maxX_ || 
    newPoint.getY()  maxY_)
{ 
   return false;
}
else
{
   // if the node has overflowed and is a leaf
   if ((numberOfPoints_ + 1) > capacity_ && leaf_ == true)
   {
      splitNode();

      // insert the new point that caused the overflow
      if (a_->insert(newPoint))
      {
         return true;
      }
      if (b_->insert(newPoint))
      {
         return true;
      }
      if (c_->insert(newPoint))
      {
         return true;
      }
      if (d_->insert(newPoint))
      {
         return true;
      }
      throw OutOfBoundsException("failed to insert new point into any \
                                  of the four child nodes, big problem");
   }

First you check if you're inside the node, then you repeat the same check for each of the children, this is a lot of branching. Unnecessary branching.

Assume for a second that the root contains the point (check this only once for each point) then one of the children must contain the point. The distinction is instead of asking:

"Does any one of my children contain the point"

you are asking

"Which one of my children contains the point, I know it's one of you".

Which is a big difference, let me show you with some pseudo code:

bool QuadTree::insert(Point p){
     if(p  max){
         return false; // Point outside tree
     } 
     return root->insert();
} 

bool QuadTreeNode::insert(Point p){
    if( leaf ){
       return addPointAndPossiblySplitThisNode(p);
    }
   
    if (p.x insert();
        }
        return upperLeftChild->insert();
    }
    if(p.y insert();
    }
    return upperRightChild->insert();
}

See how many fewer branches this code has? Note that this code breaks boundary hits towards the right and the top.

This code does at most 3 branches per node, and the branching conditions are trivial which makes a miss-predicted branch cost less (as the delay until you compute the branching condition is shorter).

On just the code I quoted, you do at most 6 branches. I realise I did not implement the node splitting part and allocation of the buckets but I did also not take that int

Code Snippets

// if the point dosen't belong in this subset of the tree return false
if (newPoint.getX() < minX_ || newPoint.getX() > maxX_ || 
    newPoint.getY() < minY_ || newPoint.getY() > maxY_)
{ 
   return false;
}
else
{
   // if the node has overflowed and is a leaf
   if ((numberOfPoints_ + 1) > capacity_ && leaf_ == true)
   {
      splitNode();

      // insert the new point that caused the overflow
      if (a_->insert(newPoint))
      {
         return true;
      }
      if (b_->insert(newPoint))
      {
         return true;
      }
      if (c_->insert(newPoint))
      {
         return true;
      }
      if (d_->insert(newPoint))
      {
         return true;
      }
      throw OutOfBoundsException("failed to insert new point into any \
                                  of the four child nodes, big problem");
   }

bool QuadTree::insert(Point p){
     if(p < min || p > max){
         return false; // Point outside tree
     } 
     return root->insert();
} 

bool QuadTreeNode::insert(Point p){
    if( leaf ){
       return addPointAndPossiblySplitThisNode(p);
    }
   
    if (p.x < midpoint.x){
        if(p.y < midpoint.y){
            return lowerLeftChild->insert();
        }
        return upperLeftChild->insert();
    }
    if(p.y < midpoint.y){
        return lowerRightChild->insert();
    }
    return upperRightChild->insert();
}

Context

StackExchange Code Review Q#10181, answer score: 3

Revisions (0)

No revisions yet.