patterncMinor

Bilinear resizing algorithm

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

image codereview c stackoverflow optimization performance

resizingalgorithmbilinear

Problem

I have the next bilinear resizing algorithm, and even though I use fixed point is still quite slow for my requirements. In the past I tried to cache some values in tables but the improvement was of a few not noticeable milliseconds.

Is any other way I can improve the speed? As this is intended for ARM processors, I know I could write it with NEON instructions, but unfortunately I don't have the knowledge for such task.

```
void resize_bilinear(const unsigned int input, unsigned int output, const unsigned int sourceWidth, const unsigned int sourceHeight, const unsigned int targetWidth, const unsigned int targetHeight)
{
unsigned int widthCoefficient, heightCoefficient, x, y;
unsigned int pixel1, pixel2, pixel3, pixel4;
unsigned int hc1, hc2, wc1, wc2, offsetX, offsetY;
unsigned int r, g, b, a;

const unsigned int wStepFixed16b = ((sourceWidth - 1) > 16);
hc2 = (heightCoefficient >> 9) & (unsigned char)127;
hc1 = 128 - hc2;

widthCoefficient = 0;

offsetPixelY = offsetY * sourceWidth;
offsetPixelY1 = (offsetY + 1) * sourceWidth;

for (x = 0; x > 16);
wc2 = (widthCoefficient >> 9) & (unsigned char)127;
wc1 = 128 - wc2;

offsetX1 = offsetX + 1;

pixel1 = *(input + (offsetPixelY + offsetX));
pixel2 = *(input + (offsetPixelY1 + offsetX));
pixel3 = *(input + (offsetPixelY + offsetX1));
pixel4 = *(input + (offsetPixelY1 + offsetX1));

a = ((((pixel1 >> 24) & 0xff) hc1 + ((pixel2 >> 24) & 0xff) hc2) * wc1 +
(((pixel3 >> 24) & 0xff) hc1 + ((pixel4 >> 24) & 0xff) hc2) * wc2) >> 14;

r = ((((pixel1 >> 16) & 0xff) hc1 + ((pixel2 >> 16) & 0xff) hc2) * wc1 +
(((pixel3 >> 16) & 0xff) hc1 + ((pixel4 >> 16) & 0xff) hc2) * wc2) >> 14;

g = ((((pixel1 >> 8) & 0xff) hc1 + ((pixel2 >> 8) & 0xff) hc2) * wc1 +
(((pixel3 >> 8) &

Solution

You could be saving important variables that are being used frequently in the register like the following:

register const unsigned int targetWidth

Make sure memory accesses are cache-optimized, i.e. clumped together.

Investigate if signed/unsigned for integer operations have performance costs on your platform.

Investigate if look-up tables rather than computations gain you anything (but these can blow the caches, so be careful).

Even though you've already avoided floats I'm going to post this anyway:
If you can avoid double and float variables, use int. On most architectures, int would be test faster type for computations because of the memory model. You can still achieve decent precision by simply shifting your units like you've been doing (ie use 1026 as int instead of 1.026 as double or float).

And, of course, do lots of profiling and measurements.

Code Snippets

register const unsigned int targetWidth

Context

StackExchange Code Review Q#28645, answer score: 2

Revisions (0)

No revisions yet.