patterncppMinor
Effective use of branch prediction in terrain generator
Viewed 0 times
predictionbranchgeneratoreffectiveuseterrain
Problem
I am trying to learn how branch optimization works.
For an experiment, I have a recursive fractal terrain generation and I have moved all
Is doing what I have done in the code better than using
Is there a better way of doing this so that the generation is faster?
Again, I am just trying to learn how I would make this code more efficient, I have no real need for it to be though.
TerrainChunkGenerator.h
In the
TerrainChunkGenerator.cpp
```
#include "TerrainChunkGenerator.h"
TerrainChunkGenerator::TerrainChunkGenerator(){}
TerrainChunkGenerator::Te
For an experiment, I have a recursive fractal terrain generation and I have moved all
if statements to binary tables (don't know the words, will comment in code).Is doing what I have done in the code better than using
if's, it runs about 15% faster.Is there a better way of doing this so that the generation is faster?
Again, I am just trying to learn how I would make this code more efficient, I have no real need for it to be though.
TerrainChunkGenerator.h
#include
#include
#include
#include
#include
#include "TerrainChunk.h"
class TerrainChunkGenerator{
private:
TerrainChunk* chunk_;
unsigned width_, length_, lod_;
unsigned num_bytes_, num_vertices_, num_triangles_;
// Recursive Generator variables
float x0_, x1_;
float y0_, y1_;
float mx_, my_;
float h_;
float avg_d_;
float avg_s0_, avg_s1_, avg_s2_, avg_s3_;
float divisor_;
// These are the binary things I was talking about
std::function sq_step_funcs_mx_y0_[2];
std::function sq_step_funcs_mx_y1_[2];
std::function sq_step_funcs_x1_my_[2];
std::function sq_step_funcs_x0_my_[2];
std::function quadrant_functions_[4];
std::map> points_; // height map
float* mesh_; // triangles woven from vertices_
public:
TerrainChunkGenerator();
TerrainChunkGenerator(unsigned width, unsigned length, unsigned lod = 6u);
void Generate();
TerrainChunk* chunk();
private:
void GenPoints();
void Recurse(unsigned count, unsigned quad, float x0, float y0, float x1, float y1);
float Random();
};In the
cpp the Generate func is called to start generation. In the Generate func there is a GenMesh func which calls Recurse and that is where the start of the lambda calling and the stand ins for if statements exists.TerrainChunkGenerator.cpp
```
#include "TerrainChunkGenerator.h"
TerrainChunkGenerator::TerrainChunkGenerator(){}
TerrainChunkGenerator::Te
Solution
What you are doing is called a Branch (Jump) Table
It is a well known technique for avoiding costly branching instructions when one wants to execute a block of code depending on the value of one variable. Most respectable compilers will compile a dense
Your method of doing it manually saves the execution time of the guard (which is a branch) to check if the argument is in range. For example if
is
If speed is the name of the game, then I find your approach acceptable provided that it's documented.
You may also be interested in Lookup Tables (LUT).
Addendum
Modern desktop CPUs will try to predict which branch is going to be taken by some unspecified method, and will typically speculatively execute along that path (some may execute along both paths). If at a later point it turns out that the actual branch taken was a different one the CPU will "rewind" to the branch point and start over on the right track.
A branch instruction can be very costly if it is consecutively miss predicted causing the CPU to have to drop any issued instructions effectively breaking the highly efficient pipe-lining going on which is the key to obtaining high Instructions Per Cycle (IPC). However a branch that is easily predicted (i.e. almost always true, or almost always false) will have a small overhead as the prediction will likely be correct and execution continues.
Interesting case in point: Why is processing a sorted array faster than an unsorted array?
Of course, avoiding the branch all together saves an instruction or two on top, see predicated instructions. For example cmov: Why is a conditional move not vulnerable for Branch Prediction Failure?.
It is a well known technique for avoiding costly branching instructions when one wants to execute a block of code depending on the value of one variable. Most respectable compilers will compile a dense
switch on a integer argument into a jump table if some conditions are met and it will also add a guard before the jump table to catch the default block and avoiding jumping and executing random data.Your method of doing it manually saves the execution time of the guard (which is a branch) to check if the argument is in range. For example if
quad inquadrant_functions_[quad]();is
>3 you will get undefined behavior while the switch approach with a default statement would just execute the default block. You know internally this can't happen so it's OK and slightly faster.If speed is the name of the game, then I find your approach acceptable provided that it's documented.
You may also be interested in Lookup Tables (LUT).
Addendum
Modern desktop CPUs will try to predict which branch is going to be taken by some unspecified method, and will typically speculatively execute along that path (some may execute along both paths). If at a later point it turns out that the actual branch taken was a different one the CPU will "rewind" to the branch point and start over on the right track.
A branch instruction can be very costly if it is consecutively miss predicted causing the CPU to have to drop any issued instructions effectively breaking the highly efficient pipe-lining going on which is the key to obtaining high Instructions Per Cycle (IPC). However a branch that is easily predicted (i.e. almost always true, or almost always false) will have a small overhead as the prediction will likely be correct and execution continues.
Interesting case in point: Why is processing a sorted array faster than an unsorted array?
Of course, avoiding the branch all together saves an instruction or two on top, see predicated instructions. For example cmov: Why is a conditional move not vulnerable for Branch Prediction Failure?.
Code Snippets
quadrant_functions_[quad]();Context
StackExchange Code Review Q#78547, answer score: 6
Revisions (0)
No revisions yet.