HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Getting the dominant RGB color of a bitmap

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thecolorgettingdominantbitmaprgb

Problem

I'm currently running this function 60 times per second in order to get the dominant color on my screen in RGB values. It is using approximately 15% of my CPU on 30FPS and 25% of my CPU on 60FPS. Is there any way I can improve the efficiency of this loop or is there a better way to get the color altogether?

public Color getDominantColor(System.Drawing.Bitmap bmp) {
            BitmapData srcData = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), ImageLockMode.ReadOnly, System.Drawing.Imaging.PixelFormat.Format32bppArgb);

            int stride = srcData.Stride;

            IntPtr Scan0 = srcData.Scan0;

            int[] totals = new int[] { 0, 0, 0 };

            int width = bmp.Width;
            int height = bmp.Height;

            unsafe
            {
                byte* p = (byte*)(void*)Scan0;

                for (int y = 0; y < height; y++) {
                    for (int x = 0; x < width; x++) {
                        for (int color = 0; color < 3; color++) {
                            int idx = (y * stride) + x * 4 + color;
                            totals[color] += p[idx];
                        }
                    }
                }
            }

            int avgB = totals[0] / (width * height);
            int avgG = totals[1] / (width * height);
            int avgR = totals[2] / (width * height);

            bmp.UnlockBits(srcData);

            return Color.FromArgb(avgR, avgG, avgB);
        }

Solution

Five ideas, from easy to hard:

-
You can simplify your x/y loop to run in one dimension instead of two-- for ( i = 0; i

-
If you can, use a lower color depth (I don't think you need 24 bits of color depth if you're just computing an average). The smaller storage size will yield a smaller memory area to scan. You will have to shift bits around to do the math, but that sort of thing is faster than memory access.

-
You could try resizing or scaling the bitmap to something lower rez. The resize operation will interpolate color. In theory you could scale it to a 1x1 image and just read that one pixel. If you use GDI+ to perform the scale it could use hardware acceleration and be very fast.

-
Keep a copy of the last bitmap and its totals. Use
REPE CMPSD (yes, this is assembly) to compare your new bitmap to the old one and find non-matching cells. Adjust totals and recompute average. This is probably a little harder than it sounds but the scan would be incredibly fast. This option would work better if most pixels are expected to stay the same from frame to frame.

-
Do the entire scan in assembly, four bytes at a time. DWord operations, believe it or not, are faster than byte operations, for a modern CPU. You can get the byte you need through bit shifting, which take very few clock cycles. Been a while for me, but would look something like this:

MOV ECX, ArrayLength ;ECX is our counter (= bytecount ÷ 4)
    MOV EDX, Scan0       ;EDX is our data pointer
    SUB BX, BX           ;Set BX to 0 for later
Loop:
    LODSL                ;Load EAX from array, increment pointer
    SHRL 8, EAX          ;Dump the least eight bits
    ADDB GreenTotal, AL  ;Add least 8 bits to green total
    ADCW GreenTotal+1,BX ;Carry the 1
    SHRL 8, EAX          ;Shift right 8 more bits
    ADDB BlueTotal, AL   ;Add least 8 bits to blue total
    ADCW BlueTotal+1, BX ;Carry the 1
    SHRL 8, EAX          ;Shift right 8 more bits
    ADDB RedTotal, AL    ;Add least 8 bits to red total
    ADCW RedTotal+1, BX  ;Carry the 1
    LOOPNZ Loop          ;Decrement CX and keep going until it is zero


If the assembly is too much to take on, you can try to do the same in C++ and maybe the compiler will do a pretty good job of it. At the very least, we have gotten rid of all of your multiplication operations (which can take up 5-20x the number of clock cycles compared to a shift), two of your loops, and a whole bunch of
if` conditionals (which would mess up your CPU's branch prediction). Also we will get nice big cache bursts regardless of the dword alignment of the byte buffer, because it is a single-dimensional contiguous blob.

Code Snippets

MOV ECX, ArrayLength ;ECX is our counter (= bytecount ÷ 4)
    MOV EDX, Scan0       ;EDX is our data pointer
    SUB BX, BX           ;Set BX to 0 for later
Loop:
    LODSL                ;Load EAX from array, increment pointer
    SHRL 8, EAX          ;Dump the least eight bits
    ADDB GreenTotal, AL  ;Add least 8 bits to green total
    ADCW GreenTotal+1,BX ;Carry the 1
    SHRL 8, EAX          ;Shift right 8 more bits
    ADDB BlueTotal, AL   ;Add least 8 bits to blue total
    ADCW BlueTotal+1, BX ;Carry the 1
    SHRL 8, EAX          ;Shift right 8 more bits
    ADDB RedTotal, AL    ;Add least 8 bits to red total
    ADCW RedTotal+1, BX  ;Carry the 1
    LOOPNZ Loop          ;Decrement CX and keep going until it is zero

Context

StackExchange Code Review Q#157667, answer score: 5

Revisions (0)

No revisions yet.