HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Fastest way to count non-zero pixels using Python and Pillow

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
nonwaypillowpixelsfastestusingpythonandcountzero

Problem

I have a Python script that creates a diff of two images using PIL. That part works fine. Now I need to find an efficient way to count all the non-black pixels (which represent parts of the two images that are different). The diff image is in RGB mode.

My initial cut was something like this:

return sum(x != (0, 0, 0) for x in diffimage.getdata())


Then I realized that the diffs were usually constrained to a portion of the image, so I used getbbox() to find the actual diff data:

bbox = diffimage.getbbox()
return sum(x != (0, 0, 0) for x in diffimage.crop(bbox).getdata()) if bbox else 0


This has the advantage of being VERY fast when the image is all black since bbox is None in that case and no pixel counting need be done.

I still wasn't satisfied, so I decided to try using more of the built-in PIL methods to avoid the generator expression with the Python conditional that needed to be evaluated for each pixel. I came up with:

bbox = diffimage.getbbox()
if not bbox: return 0
return sum(diffimage.crop(bbox)
                    .point(lambda x: 255 if x else 0)
                    .convert("L")
                    .point(bool)
                    .getdata())


This is about five times faster than the previous version. The basic steps are:

  • Crop to the bounding box to avoid counting black pixels



  • Convert all non-zero values in each channel to 255. This way, when we later convert it to grayscale, all non-black pixels are guaranteed to have non-zero values. (Because a pixel might differ in only one channel, and only by a small amount, some pixels that are not actually black might end up as black in grayscale mode, because only a fraction of that channel's value makes its way to grayscale.) BTW, the function isn't evaluated for each pixel but only once for each possible pixel value to make a lookup table, so it's very fast.



  • Convert to grayscale.



  • Convert all non-zero pixels to 1 using bool.



  • Sum all the pixel values.



Can I do b

Solution

Here's your implementation using Pillow:

def count_nonblack_pil(img):
    """Return the number of pixels in img that are not black.
    img must be a PIL.Image object in mode RGB.

    """
    bbox = img.getbbox()
    if not bbox: return 0
    return sum(img.crop(bbox)
               .point(lambda x: 255 if x else 0)
               .convert("L")
               .point(bool)
               .getdata())


And here's an implementation using Numpy:

def count_nonblack_np(img):
    """Return the number of pixels in img that are not black.
    img must be a Numpy array with colour values along the last axis.

    """
    return img.any(axis=-1).sum()


(We will need scipy.ndimage.imread to load the image.)

Here's a quick performance comparison using timeit.timeit:

>>> from PIL import Image
>>> import scipy.ndimage
>>> from timeit import timeit
>>> img1 = Image.open(filename)
>>> timeit(lambda:count_nonblack_pil(img1), number=10)
5.4229461060022
>>> img2 = scipy.ndimage.imread(filename)
>>> timeit(lambda:count_nonblack_np(img2), number=10)
2.3291947869875003


So Numpy is about two and a half times as fast on my example.

Code Snippets

def count_nonblack_pil(img):
    """Return the number of pixels in img that are not black.
    img must be a PIL.Image object in mode RGB.

    """
    bbox = img.getbbox()
    if not bbox: return 0
    return sum(img.crop(bbox)
               .point(lambda x: 255 if x else 0)
               .convert("L")
               .point(bool)
               .getdata())
def count_nonblack_np(img):
    """Return the number of pixels in img that are not black.
    img must be a Numpy array with colour values along the last axis.

    """
    return img.any(axis=-1).sum()
>>> from PIL import Image
>>> import scipy.ndimage
>>> from timeit import timeit
>>> img1 = Image.open(filename)
>>> timeit(lambda:count_nonblack_pil(img1), number=10)
5.4229461060022
>>> img2 = scipy.ndimage.imread(filename)
>>> timeit(lambda:count_nonblack_np(img2), number=10)
2.3291947869875003

Context

StackExchange Code Review Q#55902, answer score: 8

Revisions (0)

No revisions yet.