patterncMinor
Detect optimized
Viewed 0 times
optimizeddetectstackoverflow
Problem
I've been trying to optimize this piece of code:
So far I've tried "Blocking" and a few other things, but I can't seem to get it to run any faster.
Blocking resulted in:
Which ran at the same speed as the original program.
Any suggestions would be great.
void detect_optimized(int width, int height, int threshold)
{
int x, y, z;
int tmp;
for (y = 1; y 255)
tmp = 255;
if (tmp<threshold)
tmp = 0;
c[x][y][z] = 255-tmp;
}
return;
}So far I've tried "Blocking" and a few other things, but I can't seem to get it to run any faster.
Blocking resulted in:
for(yy = 1; yy255)
tmp = 255;
if (tmp<threshold)
tmp = 0;
c[x][y][z] = 255-tmp;
}}}}}Which ran at the same speed as the original program.
Any suggestions would be great.
mask_function cannot be changed, but here is its code:int mask_product(int m[3][3], byte bitmap[MAX_ROW][MAX_COL][NUM_COLORS], int x, int y, int z)
{
int tmp[9];
int i, sum;
// ADDED THIS LINE (sum = 0) TO FIX THE BUG
sum = 0;
tmp[0] = m[0][0]*bitmap[x-1][y-1][z];
tmp[1] = m[1][0]*bitmap[x][y-1][z];
tmp[2] = m[2][0]*bitmap[x+1][y-1][z];
tmp[3] = m[0][1]*bitmap[x-1][y][z];
tmp[4] = m[1][1]*bitmap[x][y][z];
tmp[5] = m[2][1]*bitmap[x+1][y][z];
tmp[6] = m[0][2]*bitmap[x-1][y+1][z];
tmp[7] = m[1][2]*bitmap[x][y+1][z];
tmp[8] = m[2][2]*bitmap[x+1][y+1][z];
for (i=0; i<9; i++)
sum = sum + tmp[i];
return sum;
}Solution
I am afraid I can't do much about the performance but here are some other tips:
-
Precalculating loop bounds to avoid repeated recalculation might give another speedup:
You could gain some speedup by replacing the inner loop over the colors by MMX functionality which allows to do the same calculation on all colors at once. However, I am no expert in this and it likely requires inline assembly.
- The
zloop bound is not checked againstNUM_COLORS(which I believe is the size of the array for this index).
zshould be renamed to something likecolorIndex.
- Using prefix increments (
++x) might give a speedup
-
Precalculating loop bounds to avoid repeated recalculation might give another speedup:
maxY = width - 1; and then for (y = 1; y
-
tmp should have a better name (what does it actually contain/represent?)
-
place the ifs into an own function (or macro) clamp (which is a widely used name and maybe has a standard implementation) and use it to make the code cleaner: tmp = clamp(tmp, 0, 255);
-
do not use global variables instead of function parameters, they introduce hard to debug side effects
- Although you cannot change the mask function I would recommend avoiding the
tmp` array and instead directly summing up into sum.You could gain some speedup by replacing the inner loop over the colors by MMX functionality which allows to do the same calculation on all colors at once. However, I am no expert in this and it likely requires inline assembly.
Context
StackExchange Code Review Q#47600, answer score: 5
Revisions (0)
No revisions yet.