patterncppMinor

Not sure if I missed some obvious CPU brakes

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

codereview cpp stackoverflow c++optimization performance

missedobvioussuresomebrakescpunot

Problem

I have three functions (out of a medium 2 digit number of functions) which use up 80% of CPU time, so I kind of wonder if there are points for optimization that I am missing.

Function 1, Extracting a bilinear interpolated point from a width * height float value image:

float ExtractBilinear(float* image, int w, int h, float x, float y)
{
    int x0 = (int)floor(x);
    int y0 = (int)floor(y);
    int x1 = x0 + 1;
    int y1 = y0 + 1;
    float c00, c01, c11, c10;
    c00 = c01 = c11 = c10 = 0.0f;

    if(x0  w || y0  h)
    {
        return 0.0f;
    }

    if(x0<0)
    {
        c00 = 0.0f;
        c01 = 0.0f;
    }
    else
    {
        if(y0<0)
        {
            c00 = 0.0f;
        }
        else
        {
            c00 = image[y0*w + x0];
        }
        if(y1 < h)
        {
            c01 = image[y1*w + x0];
        }
        else
        {
            c01 = 0.0f;
        }
    }

    if(x1 < w)
    {
        if(y0<0)
        {
            c10 = 0.0f;
        }
        else
        {
            c10 = image[y0*w + x1];
        }
        if(y1 < h)
        {
            c11 = image[y1*w + x1];
        }
        else
        { 
            c11 = 0.0f;
        }
    }
    else
    {
        c10 = 0.0f;
        c11 = 0.0f;
    }

    float c0 = c10 * (x - x0) + c00 * (x1 - x);
    float c1 = c11 * (x - x0) + c01 * (x1 - x);
    return c1 * (y - y0) + c0 * (y1 - y);
}

Function 2: Taking 2 float samples of data (one is image data, one is simulated data), and calculating the sum of squared differences error, with a scalar 'a' to minimize the error as much as possible. SampleX is the size of the input, compX the size of the output, and offX is deprecated, and always 0 (due to legacy code I am keeping it in there).

```
float PatternMatcher::GetSADFloatRel(float sample, float compared, int sampleX, int compX, int offX)
{
if (sampleX != compX)
{
return 50000.0f;
}
float result = 0;

float* pTemp1 = sample;
float* pTem

Solution

Pre-computing things is useful when it is possible. As some one has already said in your case may be advisable to precomunte things as Width/2. But, if Width is defined through a "Define" your compiler shall be able to do this optimization by itself.

But, I believe that the problem in your code is mainly another one. You have a first loop between startMin and StartMax and than another nested loop from o to imageSize (which i expect being a large number). And this ends up to be the most time expensive part of your code.

There you have other two loops such as:
for(int ik = NKernel-1; ik>=0; ik--)
or
for(int ky=-half_MaxK; ky<=half_MaxK; ky++)

This you may think that is an "optimized" loop, but actually it isn't, because the compiler is (generally) designed to be really good at optimizing loops from 0 to N.

Thus you can have significant improvement. Write your two inner loops in such way and then just compute your index (ik and ky) by subtracting an offset.

Context

StackExchange Code Review Q#23485, answer score: 4

Revisions (0)

No revisions yet.