HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Effective use of branch prediction in terrain generator

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
predictionbranchgeneratoreffectiveuseterrain

Problem

I am trying to learn how branch optimization works.

For an experiment, I have a recursive fractal terrain generation and I have moved all if statements to binary tables (don't know the words, will comment in code).

Is doing what I have done in the code better than using if's, it runs about 15% faster.

Is there a better way of doing this so that the generation is faster?

Again, I am just trying to learn how I would make this code more efficient, I have no real need for it to be though.

TerrainChunkGenerator.h

#include 
#include 
#include 
#include 
#include 
#include "TerrainChunk.h"
class TerrainChunkGenerator{
private:
    TerrainChunk* chunk_;
    unsigned width_, length_, lod_;
    unsigned num_bytes_, num_vertices_, num_triangles_;
    // Recursive Generator variables
    float x0_, x1_;
    float y0_, y1_;
    float mx_, my_;
    float h_;
    float avg_d_;
    float avg_s0_, avg_s1_, avg_s2_, avg_s3_;
    float divisor_;

    // These are the binary things I was talking about
    std::function sq_step_funcs_mx_y0_[2];
    std::function sq_step_funcs_mx_y1_[2];
    std::function sq_step_funcs_x1_my_[2];
    std::function sq_step_funcs_x0_my_[2];

    std::function quadrant_functions_[4];
    std::map> points_; // height map
    float* mesh_; // triangles woven from vertices_
public:
    TerrainChunkGenerator();
    TerrainChunkGenerator(unsigned width, unsigned length, unsigned lod = 6u);
    void Generate();
    TerrainChunk* chunk();
private:
    void GenPoints();
    void Recurse(unsigned count, unsigned quad, float x0, float y0, float x1, float y1);
    float Random();
};


In the cpp the Generate func is called to start generation. In the Generate func there is a GenMesh func which calls Recurse and that is where the start of the lambda calling and the stand ins for if statements exists.

TerrainChunkGenerator.cpp

```
#include "TerrainChunkGenerator.h"

TerrainChunkGenerator::TerrainChunkGenerator(){}
TerrainChunkGenerator::Te

Solution

What you are doing is called a Branch (Jump) Table

It is a well known technique for avoiding costly branching instructions when one wants to execute a block of code depending on the value of one variable. Most respectable compilers will compile a dense switch on a integer argument into a jump table if some conditions are met and it will also add a guard before the jump table to catch the default block and avoiding jumping and executing random data.

Your method of doing it manually saves the execution time of the guard (which is a branch) to check if the argument is in range. For example if quad in

quadrant_functions_[quad]();


is >3 you will get undefined behavior while the switch approach with a default statement would just execute the default block. You know internally this can't happen so it's OK and slightly faster.

If speed is the name of the game, then I find your approach acceptable provided that it's documented.

You may also be interested in Lookup Tables (LUT).

Addendum

Modern desktop CPUs will try to predict which branch is going to be taken by some unspecified method, and will typically speculatively execute along that path (some may execute along both paths). If at a later point it turns out that the actual branch taken was a different one the CPU will "rewind" to the branch point and start over on the right track.

A branch instruction can be very costly if it is consecutively miss predicted causing the CPU to have to drop any issued instructions effectively breaking the highly efficient pipe-lining going on which is the key to obtaining high Instructions Per Cycle (IPC). However a branch that is easily predicted (i.e. almost always true, or almost always false) will have a small overhead as the prediction will likely be correct and execution continues.

Interesting case in point: Why is processing a sorted array faster than an unsorted array?

Of course, avoiding the branch all together saves an instruction or two on top, see predicated instructions. For example cmov: Why is a conditional move not vulnerable for Branch Prediction Failure?.

Code Snippets

quadrant_functions_[quad]();

Context

StackExchange Code Review Q#78547, answer score: 6

Revisions (0)

No revisions yet.