patterncppMinor
Optimizing bilinear interpolation
Viewed 0 times
bilinearinterpolationoptimizing
Problem
I have spent countless hours trying to speed up my bilinear interpolation up. I even tried implementing an SSE version (a double version and a float version), but that was even slower than this version.
Does anyone have any ideas?
I wrote this under VS2010 and it is intended to be called from MATLAB as a MEX function (thus the minus one subtraction in interp2_mx because indexing in MATLAB is from 1:end as opposed to 0:end-1).
My function interp2 requires a T (intended float/double type) pointer to Z (mz by nz matrix) and interpolates the points in XI and YI into ZI (which are all the same size mi by ni).
```
template
__forceinline void interp2_mx(const T& x, const T& y,
const T* z,
const int32_t& n,
const int32_t& mm2,
const int32_t& nm2,
T& val,
const T& extrapval = T(0))
{
int64_t xp = (int64_t)(x) - 1; // adjust for MATLAB indexing
int64_t yp = (int64_t)(y) - 1;
if (xp nm2 || yp mm2)
{
val = extrapval;
}
else
{
const T line = z + yp n + xp;
T xf = x - (int64_t)(x); // get decimal portion
T yf = y - (int64_t)(y);
T x1mf = (T)1 - xf;
T y1mf = (T)1 - yf;
T v00 = x1mf y1mf (*(line));
T v01 = xf y1mf (*(line + 1));
T v10 = x1mf yf (*(line + n));
T v11 = xf yf (*(line + n + 1));
val = v00 + v01 + v10 + v11;
}
}
template
void interp2(const T* z,
const int32_t& mz, const int32_t& nz,
const T xi, const T yi,
const int32_t& mi, const int32_t& ni,
T* zi,
const T& extrapval = T(0))
{
const int32_t nzm2 = nz - 2;
const int32_t mzm2 = mz - 2;
#pragma omp parallel for
for (int m = 0; m < mi; ++m)
{
T line_zi = zi + m ni;
const T x = xi + m ni;
const T y = yi + m ni;
for (int n = 0; n < ni; ++n, ++
Does anyone have any ideas?
I wrote this under VS2010 and it is intended to be called from MATLAB as a MEX function (thus the minus one subtraction in interp2_mx because indexing in MATLAB is from 1:end as opposed to 0:end-1).
My function interp2 requires a T (intended float/double type) pointer to Z (mz by nz matrix) and interpolates the points in XI and YI into ZI (which are all the same size mi by ni).
```
template
__forceinline void interp2_mx(const T& x, const T& y,
const T* z,
const int32_t& n,
const int32_t& mm2,
const int32_t& nm2,
T& val,
const T& extrapval = T(0))
{
int64_t xp = (int64_t)(x) - 1; // adjust for MATLAB indexing
int64_t yp = (int64_t)(y) - 1;
if (xp nm2 || yp mm2)
{
val = extrapval;
}
else
{
const T line = z + yp n + xp;
T xf = x - (int64_t)(x); // get decimal portion
T yf = y - (int64_t)(y);
T x1mf = (T)1 - xf;
T y1mf = (T)1 - yf;
T v00 = x1mf y1mf (*(line));
T v01 = xf y1mf (*(line + 1));
T v10 = x1mf yf (*(line + n));
T v11 = xf yf (*(line + n + 1));
val = v00 + v01 + v10 + v11;
}
}
template
void interp2(const T* z,
const int32_t& mz, const int32_t& nz,
const T xi, const T yi,
const int32_t& mi, const int32_t& ni,
T* zi,
const T& extrapval = T(0))
{
const int32_t nzm2 = nz - 2;
const int32_t mzm2 = mz - 2;
#pragma omp parallel for
for (int m = 0; m < mi; ++m)
{
T line_zi = zi + m ni;
const T x = xi + m ni;
const T y = yi + m ni;
for (int n = 0; n < ni; ++n, ++
Solution
Profile It!
As suggested in the comments, you should profile the code to see where the slowdown is.
Algebra Is Your Friend
One small suggestion I can offer is some algebraic improvements.
A linear interpolation between
That's 2 adds and 2 multiplies. You can use some algebra to reduce that to:
which is 2 adds and 1 multiply. Doing that also removes the need to calculate
That takes you from 8 multiplies to 3, but from 9 adds to 10. So that's likely from 17 instructions to 13.
There are a few more optimizations you can do:
You're subtracting 1 from
You might also have luck by keeping track of a pointer to the next input pixel just incrementing it instead of calculating
Naming
It would be nice if you could pick more easily understandable names for your functions and variables. Why is this called
Using
Likewise,
Why do
Errors
Looking at your error handling, I'm not convinced it's sufficient. You check if
As suggested in the comments, you should profile the code to see where the slowdown is.
Algebra Is Your Friend
One small suggestion I can offer is some algebraic improvements.
A linear interpolation between
a and b can be formulated like this:result = a * interp + b * (1 - interp);That's 2 adds and 2 multiplies. You can use some algebra to reduce that to:
result = interp * (a - b) + b;which is 2 adds and 1 multiply. Doing that also removes the need to calculate
x1mf and y1mf. So your inner loop would become:T xf = x - (int64_t)(x); // get decimal portion
T yf = y - (int64_t)(y);
T a = *line;
T b = *(line + 1);
T top = xf * (a - b) + b;
T c = *(line + n);
T d = *(line + n + 1);
T bottom = xf * (c - d) + d;
val = yf * (top - bottom) + bottom;That takes you from 8 multiplies to 3, but from 9 adds to 10. So that's likely from 17 instructions to 13.
There are a few more optimizations you can do:
You're subtracting 1 from
x and y every time through the loop. You could calculate the values with the 1 already subtracted out in interp2() rather than in interp2_mx():for (int m = 0; m < mi; ++m)
{
T* line_zi = zi + m * ni;
const T* x = xi + m * ni - 1; // <- Do this here instead of on every loop iteration!
const T* y = yi + m * ni - 1; // <- Same hereYou might also have luck by keeping track of a pointer to the next input pixel just incrementing it instead of calculating
line every time through the loop, too.Naming
It would be nice if you could pick more easily understandable names for your functions and variables. Why is this called
interp2()? What is interp1()? You should name it bilinearInterp() or something obvious like that. And what does interp2_mx() mean?Using
x and y to represent horizontal and vertical offsets is fine. Those are well-established traditions. But you're naming the start of your image z which is really weird. z is usually reserved for depth, at least when used with x and y. That makes your code very confusing to read.Likewise,
mz and nz are confusing, too. And even worse in iterp2_mx(), mm2 and nm2 are not only confusing but easy to mix up! mz and nz should be something like imageWidth and imageHeight (or matrixWidth and matrixHeight, or inputWidth and inputHeight), and the other variables and arguments should be appropriately named as well.Why do
xi and yi have an i in their names? That makes them look like integers, but they're declared as type T, which you say will likely be float or double.Errors
Looking at your error handling, I'm not convinced it's sufficient. You check if
xp is less than 0 or greater than nm2, and the same for yp. But what happens when xp exactly equals nm2? line ends up pointing to the last pixel in a line. But then you calculate line + 1. That gets you the first element of the next row. And for yp, it means reading a row that doesn't exist when yp equals mm2.Code Snippets
result = a * interp + b * (1 - interp);result = interp * (a - b) + b;T xf = x - (int64_t)(x); // get decimal portion
T yf = y - (int64_t)(y);
T a = *line;
T b = *(line + 1);
T top = xf * (a - b) + b;
T c = *(line + n);
T d = *(line + n + 1);
T bottom = xf * (c - d) + d;
val = yf * (top - bottom) + bottom;for (int m = 0; m < mi; ++m)
{
T* line_zi = zi + m * ni;
const T* x = xi + m * ni - 1; // <- Do this here instead of on every loop iteration!
const T* y = yi + m * ni - 1; // <- Same hereContext
StackExchange Code Review Q#100744, answer score: 6
Revisions (0)
No revisions yet.