patterncsharpMinor
Getting the dominant RGB color of a bitmap
Viewed 0 times
thecolorgettingdominantbitmaprgb
Problem
I'm currently running this function 60 times per second in order to get the dominant color on my screen in RGB values. It is using approximately 15% of my CPU on 30FPS and 25% of my CPU on 60FPS. Is there any way I can improve the efficiency of this loop or is there a better way to get the color altogether?
public Color getDominantColor(System.Drawing.Bitmap bmp) {
BitmapData srcData = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), ImageLockMode.ReadOnly, System.Drawing.Imaging.PixelFormat.Format32bppArgb);
int stride = srcData.Stride;
IntPtr Scan0 = srcData.Scan0;
int[] totals = new int[] { 0, 0, 0 };
int width = bmp.Width;
int height = bmp.Height;
unsafe
{
byte* p = (byte*)(void*)Scan0;
for (int y = 0; y < height; y++) {
for (int x = 0; x < width; x++) {
for (int color = 0; color < 3; color++) {
int idx = (y * stride) + x * 4 + color;
totals[color] += p[idx];
}
}
}
}
int avgB = totals[0] / (width * height);
int avgG = totals[1] / (width * height);
int avgR = totals[2] / (width * height);
bmp.UnlockBits(srcData);
return Color.FromArgb(avgR, avgG, avgB);
}Solution
Five ideas, from easy to hard:
-
You can simplify your x/y loop to run in one dimension instead of two--
-
You can simplify your x/y loop to run in one dimension instead of two--
for ( i = 0; i
-
If you can, use a lower color depth (I don't think you need 24 bits of color depth if you're just computing an average). The smaller storage size will yield a smaller memory area to scan. You will have to shift bits around to do the math, but that sort of thing is faster than memory access.
-
You could try resizing or scaling the bitmap to something lower rez. The resize operation will interpolate color. In theory you could scale it to a 1x1 image and just read that one pixel. If you use GDI+ to perform the scale it could use hardware acceleration and be very fast.
-
Keep a copy of the last bitmap and its totals. Use REPE CMPSD (yes, this is assembly) to compare your new bitmap to the old one and find non-matching cells. Adjust totals and recompute average. This is probably a little harder than it sounds but the scan would be incredibly fast. This option would work better if most pixels are expected to stay the same from frame to frame.
-
Do the entire scan in assembly, four bytes at a time. DWord operations, believe it or not, are faster than byte operations, for a modern CPU. You can get the byte you need through bit shifting, which take very few clock cycles. Been a while for me, but would look something like this:
MOV ECX, ArrayLength ;ECX is our counter (= bytecount ÷ 4)
MOV EDX, Scan0 ;EDX is our data pointer
SUB BX, BX ;Set BX to 0 for later
Loop:
LODSL ;Load EAX from array, increment pointer
SHRL 8, EAX ;Dump the least eight bits
ADDB GreenTotal, AL ;Add least 8 bits to green total
ADCW GreenTotal+1,BX ;Carry the 1
SHRL 8, EAX ;Shift right 8 more bits
ADDB BlueTotal, AL ;Add least 8 bits to blue total
ADCW BlueTotal+1, BX ;Carry the 1
SHRL 8, EAX ;Shift right 8 more bits
ADDB RedTotal, AL ;Add least 8 bits to red total
ADCW RedTotal+1, BX ;Carry the 1
LOOPNZ Loop ;Decrement CX and keep going until it is zero
If the assembly is too much to take on, you can try to do the same in C++ and maybe the compiler will do a pretty good job of it. At the very least, we have gotten rid of all of your multiplication operations (which can take up 5-20x the number of clock cycles compared to a shift), two of your loops, and a whole bunch of if` conditionals (which would mess up your CPU's branch prediction). Also we will get nice big cache bursts regardless of the dword alignment of the byte buffer, because it is a single-dimensional contiguous blob.Code Snippets
MOV ECX, ArrayLength ;ECX is our counter (= bytecount ÷ 4)
MOV EDX, Scan0 ;EDX is our data pointer
SUB BX, BX ;Set BX to 0 for later
Loop:
LODSL ;Load EAX from array, increment pointer
SHRL 8, EAX ;Dump the least eight bits
ADDB GreenTotal, AL ;Add least 8 bits to green total
ADCW GreenTotal+1,BX ;Carry the 1
SHRL 8, EAX ;Shift right 8 more bits
ADDB BlueTotal, AL ;Add least 8 bits to blue total
ADCW BlueTotal+1, BX ;Carry the 1
SHRL 8, EAX ;Shift right 8 more bits
ADDB RedTotal, AL ;Add least 8 bits to red total
ADCW RedTotal+1, BX ;Carry the 1
LOOPNZ Loop ;Decrement CX and keep going until it is zeroContext
StackExchange Code Review Q#157667, answer score: 5
Revisions (0)
No revisions yet.