HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Optimize the Buddhabrot

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
buddhabrottheoptimize

Problem

I am currently working on my own implementation of the Buddhabrot. So far I am using the std::thread-class from C++11 to concurrently work through the following iteration:

void iterate(float *res){
    //generate starting point
    std::default_random_engine generator;
    std::uniform_real_distribution distribution(-1.5,1.5);

    double ReC,ImC;
    double ReS,ImS,ReS_;
    unsigned int steps;

    unsigned int visitedPos[maxCalcIter];

    unsigned int succSamples(0);

    //iterate over it
    while(succSamples  minCalcIter)&&(ReS*ReS + ImS*ImS > 4)){
            succSamples++;
            for (int j = steps; j--;){
                //std::cout << visitedPos[j] << std::endl;
                res[visitedPos[j]]++;
            }
        }
    }
}


These are my declarations and main():

#include 
#include 
#include 
#include 
#include 

#define outputSize 20000

#define samplesPerThread 5000
#define numThreads 16

#define maxCalcIter 60000
#define minCalcIter 40000

//iterate function

int main(){
    float *outImg = new float[outputSize*outputSize];

    std::vector threads;
    for (int i = numThreads; i--; ){
        threads.push_back(std::thread(iterate,outImg));
    }
    for (auto& t : threads){
        t.join();
    }

    return 0;
}


I am basically working in every thread so long that I generated enough trajectories of sufficient length which in expectation takes the same time in every thread.

But I really have the feeling that this function might me very unoptimized since its code is so very readable. Can anybody come up with some fancy optimizations? When it comes to compiling I just use:


g++ -O4 -std=c++11 -I/usr/include/OpenEXR/ -L/usr/lib64/ -lHalf -lIlmImf -lm buddha_cpu.cpp -o buddha_cpu

Any hints on crunching some more numbers/sec would be really appreciated. Also, any links to further literature are totally welcome.

Solution

Instead of talking about the algorithm, I will review what can be done better even if I don't understand the algorithm. Actually, I will review what can be done better so that I can understand the algorithm:

-
First of all, don't use macros to define your constants. Since you are using C++11, you can make them constexpr variables instead and use them as array indices too if needed:

constexpr unsigned int outputSize = 20000;
constexpr unsigned int samplesPerThread = 5000;
constexpr unsigned int numThreads = 16;

constexpr unsigned int maxCalcIter = 60000;
constexpr unsigned int minCalcIter = 40000;


-
You are using std::sqrt but your forgot to include the header `. While it has probably been included by some other header, it is not guaranteed by the standard thus not including it may cause your program to fail to compile with some implementations.

-
By the way, try to always
std::-qualify the functions from the standard library. Unless you are explicitly using argument-dependent lookup of course.

-
This line is not really readable:

for (unsigned int j = maxCalcIter; (ReS*ReS + ImS*ImS < 4)&&(j--); ){


It would be better as:

for (unsigned int j = maxCalcIter; (ReS*ReS + ImS*ImS  0); --j){


You may have heard that iterating from max to zero is faster, but in most of the cases, this is at best a micro-optimization. The "natural" ordering is from 0 to max and that's what readers will expect. Generally speaking, choosing a good algorithm and good data structures will help you more than micro-optimizing. Worse: if micro-optimizations are done too early when they are not needed, it will hinder readability.

Considering that
step is incremented for each iteration of the loop, you can put the incrementation in the for too:

for (unsigned int j = 0 ;
     (ReS*ReS + ImS*ImS < 4) && (j < maxCalcIter) ;
     ++j, ++steps)
{ /* ... */ }


-
By the way, with modern caches and processors, iterating forward may be significantly faster than iterating backwards if your iterator is used to access memory. Therefore, I expect the last loop of your function to be faster (and more readable) if you iterate forward instead of backwards since the index
j is actually used to access memory.

-
Now, let's see your
main function too: if you don't write any return statement in main, the compiler will automagically add return 0;. Unless you also intend to return error codes, not writing return 0; may be a way of documenting that your program cannot return error codes.

-
You are allocating memory manually in the following line:

float *outImg = new float[outputSize*outputSize];


You should avoid manually allocating memory since it can lead to memory leaks if you forget to delete it later (hint: your forgot). You can always use an
std::vector to create a dynamic array and you can access its underlying memory thanks to the method data().

Even better: you know the size of the allocated memory at compile-time. Therefore, you can use an
std::array instead. This class allocates its memory on the stack, which means that iteration and some other operations may be faster. Avoid using one if your array is so big that it could blow the stack though (well, in your case, I guess that it would blow the stack).

-
You are using
threads.push_back(std::thread(/ ... /)) to create new threads and add them to threads. You can use emplace_back` to create them directly into the vector instead. Your code will even be shorter:

threads.emplace_back(/* ... */);

Code Snippets

constexpr unsigned int outputSize = 20000;
constexpr unsigned int samplesPerThread = 5000;
constexpr unsigned int numThreads = 16;

constexpr unsigned int maxCalcIter = 60000;
constexpr unsigned int minCalcIter = 40000;
for (unsigned int j = maxCalcIter; (ReS*ReS + ImS*ImS < 4)&&(j--); ){
for (unsigned int j = maxCalcIter; (ReS*ReS + ImS*ImS < 4) && (j > 0); --j){
for (unsigned int j = 0 ;
     (ReS*ReS + ImS*ImS < 4) && (j < maxCalcIter) ;
     ++j, ++steps)
{ /* ... */ }
float *outImg = new float[outputSize*outputSize];

Context

StackExchange Code Review Q#73980, answer score: 7

Revisions (0)

No revisions yet.