patterncppModerate

Speed optimization for transparent gradient blend algorithm

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

blendgradienttransparentoptimizationalgorithmforspeed

Problem

The following algorithm works correctly for the purposes that I need, which is to set the top side of an image with a transparent gradient blend. Any tips to optimize it for speed?

void ImageUtil::sectionAlphaGradient(uint32_t* pixelBuffer, const int width, const int gradientHeight)
{
    unsigned int pixelOffsetY, pixelIndex;
    uint8_t A, R, G, B;

    const unsigned short OPAQUE = 255;

    const uint32_t BACKCOLOR = 0x00000000;
    const uint8_t backR = (BACKCOLOR & 0x00FF0000) >> 16;
    const uint8_t backG = (BACKCOLOR & 0x0000FF00) >> 8;
    const uint8_t backB = (BACKCOLOR & 0x000000FF);

    for (unsigned int y = 0; y > 16;
            G = (pixelValue & 0x0000FF00) >> 8;
            B = (pixelValue & 0x000000FF);

            R = (R * A + backR * (OPAQUE - A)) / OPAQUE;
            G = (G * A + backG * (OPAQUE - A)) / OPAQUE;
            B = (B * A + backB * (OPAQUE - A)) / OPAQUE;

            pixelBuffer[pixelIndex] = (uint32_t)((A << 24) | ((R & 0xFF) << 16) | ((G & 0xFF) << 8) | (B & 0xFF));
        }
    }
}

Solution

There are a number of things you could do to make this faster and better.

Use a pointer rather than array references

The code currently computes a pixelIndex and uses it as in pixelValue = pixelBuffer[pixelIndex], but what's actually happening is that each pixel is visited in order. The code can be sped up quite a bit with just this one change.

pixelValue = *pixelBuffer;
// .. do processing
++pixelBuffer;   // advance to the next pixel

Avoid shifting

Right now, the code masks off the pixel value for each color and then shifts it, does calculations and then shifts it back into place. Those shifts are not needed. Instead of this:

const uint8_t backR = (BACKCOLOR & 0x00FF0000) >> 16;
R = (pixelValue & 0x00FF0000) >> 16;
pixelBuffer[pixelIndex] = (uint32_t)((A << 24) | ((R & 0xFF) << 16) 
    | ((G & 0xFF) << 8) | (B & 0xFF));

The code could instead do this:

const uint32_t maskR = 0x00FF0000;
const uint32_t backR = BACKCOLOR & maskR;
R = *pixelBuffer & maskR;
*pixelBuffer++ = (uint32_t)((A << 24) | (R & maskR) 
    | (G & maskG) | (B & maskB));

Use same-sized constants

Using a mix of uint8_t, uint32_t and unsigned short constants makes the compiler work much harder with all of the implicit transformations. Better would be to use all the same size, such as uint32_t, the same as your pixel data. The code generated by the compiler may be simpler, smaller and faster as a result.

Move loop invariants out of the loop

Compilers can often perform this optimization for you, but it's good to give them a hint. This code computes this each time through the inner loop:

R = (R * A + backR * (OPAQUE - A)) / OPAQUE;

But A, backR and OPAQUE don't change within that loop, so better would be to calculate backR * (OPAQUE - A) as part of the outer loop.

Use named constants

This isn't a speed issue, but instead of using the constant 0x00FF0000 multiple places, it's better from a maintenance standpoint to turn that into a named constant maskR to avoid errors. Alternatively, one could create a function or macro that does the masking.

Be careful with signed vs. unsigned

The loop counters x and y are declared as int values, and they're compared to width and gradientHeight, but are you ever really going to want negative numbers for any of those? It seems to me that they'd be better as unsigned.

Is BACKCOLOR really 0?

More of the math could be eliminated if BACKCOLOR is really always supposed to be zero. I assumed instead, that it might be set to some other color, and so I left it in place.

Declare variables as late as possible

Rather than using the old C-style of declaring all variables at the top of a function, use the modern C++-style and declare variables as late as possible. Doing so can sometimes help the compiler figure out register allocation, resulting in faster, smaller code.

Putting it all together

When all of these are applied, the resulting code is shorter, cleaner and faster:

#define maskR(x) (x & 0x00FF0000)
#define maskG(x) (x & 0x0000FF00)
#define maskB(x) (x & 0x000000FF)

void sectionAlphaGradient2(uint32_t* pixelBuffer, 
    const unsigned width, const unsigned gradientHeight)
{
    const uint32_t OPAQUE = 255;
    const uint32_t BACKCOLOR = 0x00000000;
    const uint32_t backR = maskR(BACKCOLOR);
    const uint32_t backG = maskG(BACKCOLOR);
    const uint32_t backB = maskB(BACKCOLOR);

    for (unsigned y = 0; y < gradientHeight; y++)
    {
        uint32_t A = (uint8_t)((OPAQUE * y) / gradientHeight);
        uint32_t shiftedA = A << 24;
        uint32_t altR = backR *(OPAQUE - A);
        uint32_t altG = backG *(OPAQUE - A);
        uint32_t altB = backB *(OPAQUE - A);
        for (unsigned int x = 0; x < width; x++)
        {
            uint32_t R = (maskR(*pixelBuffer) * A + altR) / OPAQUE;
            uint32_t G = (maskG(*pixelBuffer) * A + altG) / OPAQUE;
            uint32_t B = (maskB(*pixelBuffer) * A + altB) / OPAQUE;
            *pixelBuffer++ = shiftedA | maskR(R) | maskG(G) | maskB(B);
        }
    }
}

Results

On my machine, a 64-bit Linux box running g++ 4.8.3 and using test code with an 8000 by 6000 pixel buffer, I get the following results:

original: 240977 microseconds
improved: 125714 microseconds

Code Snippets

pixelValue = *pixelBuffer;
// .. do processing
++pixelBuffer;   // advance to the next pixel

const uint8_t backR = (BACKCOLOR & 0x00FF0000) >> 16;
R = (pixelValue & 0x00FF0000) >> 16;
pixelBuffer[pixelIndex] = (uint32_t)((A << 24) | ((R & 0xFF) << 16) 
    | ((G & 0xFF) << 8) | (B & 0xFF));

const uint32_t maskR = 0x00FF0000;
const uint32_t backR = BACKCOLOR & maskR;
R = *pixelBuffer & maskR;
*pixelBuffer++ = (uint32_t)((A << 24) | (R & maskR) 
    | (G & maskG) | (B & maskB));

R = (R * A + backR * (OPAQUE - A)) / OPAQUE;

#define maskR(x) (x & 0x00FF0000)
#define maskG(x) (x & 0x0000FF00)
#define maskB(x) (x & 0x000000FF)

void sectionAlphaGradient2(uint32_t* pixelBuffer, 
    const unsigned width, const unsigned gradientHeight)
{
    const uint32_t OPAQUE = 255;
    const uint32_t BACKCOLOR = 0x00000000;
    const uint32_t backR = maskR(BACKCOLOR);
    const uint32_t backG = maskG(BACKCOLOR);
    const uint32_t backB = maskB(BACKCOLOR);

    for (unsigned y = 0; y < gradientHeight; y++)
    {
        uint32_t A = (uint8_t)((OPAQUE * y) / gradientHeight);
        uint32_t shiftedA = A << 24;
        uint32_t altR = backR *(OPAQUE - A);
        uint32_t altG = backG *(OPAQUE - A);
        uint32_t altB = backB *(OPAQUE - A);
        for (unsigned int x = 0; x < width; x++)
        {
            uint32_t R = (maskR(*pixelBuffer) * A + altR) / OPAQUE;
            uint32_t G = (maskG(*pixelBuffer) * A + altG) / OPAQUE;
            uint32_t B = (maskB(*pixelBuffer) * A + altB) / OPAQUE;
            *pixelBuffer++ = shiftedA | maskR(R) | maskG(G) | maskB(B);
        }
    }
}

Context

StackExchange Code Review Q#69826, answer score: 12

Revisions (0)

No revisions yet.