HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Dynamic Colour Binning: Grouping Similar Colours in Images

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
groupingcoloursdynamicbinningcolourimagessimilar

Problem

This is a piece of code that implements an image-processing algorithm I came up with. I call it Dynamic Colour Binning. It's a fairly academic exercise that was more about providing a learning experience than about producing something useful, but here is what it does:

Input: A set of images with a limited colour space. I designed this with (simple) geographical maps in mind, where regions are represented by easily distinguishable colours. However, those colours need not be entirely pure (due to JPEG artefacts, antialiasing and whatnot.) The point of the algorithm is to group close colours together.

Output: For each image, a set of colours, with very similar colours grouped together, and their pixel count.

Since that probably sounds a little abstract, here's an example. One might take all of the maps from the Wikipedia page on the Territorial Evolution of the US and produce from that a plot like this (where pixel count has been translated into surface area):

The Algorithm: Simply put, the algorithm first finds all the colours in the image, and sorts them by pixel count. Then, starting from the most abundant colour, it takes every other colour, and if that colour is within some distance from the reference colour in Lab Colour Space it is deleted from the list of colours and its pixel count is added to that of the reference colour.

Architecture: I created three classes, from top to bottom:

A SetOfMaps, which is essentially just a list of Maps, together with some labels and some functionality for making the final output uniform and storing it in a pandas DataFrame.

A Map object, which is an OpenCV image with its list of colours and an additional method that acts on the image itself (performs a 'cleaning' of the image that makes the colour groups uniform.)

A ColorList, which is meant to always be a member of a Map but which I separated out because it does all the heavy lifting of the algorithm.

Questions: I am of course happy to receive any kind of

Solution

I don't know enough about pandas and numpy, but I decided to take a look at some of the code anyway.

def bin_dataframe(self, radius):
    """
    This function looks at the Set's dataframe and checks whether there are
    columns that are closer together than _radius_ in colorspace. Such columns
    are then merged. 

    The algorithm is similar to the DCB algorithm itself, which is heavily commented
    in the ColorList class.
    """
    cols = list(self.dataframe)

    # Perform checking
    for col in cols:
      colbgr = literal_eval(col)
      color = sRGBColor(colbgr[0], colbgr[1], colbgr[2], is_upscaled=True)
      color_lab = convert_color(color, LabColor)

      for compcol in cols[cols.index(col)+1:]:
        compcolbgr = literal_eval(compcol)
        compcolor = sRGBColor(compcolbgr[0], compcolbgr[1], compcolbgr[2], is_upscaled=True)
        compcolor_lab = convert_color(compcolor, LabColor)
        delta = delta_e_cie2000(color_lab, compcolor_lab)
        if ( delta < radius ):
          self.dataframe[col].fillna(self.dataframe[compcol], inplace=True)
          del self.dataframe[compcol]
          cols.remove(compcol)

    # Clean up dataframe (sorting columns, setting NaN to 0)
    #self.dataframe.sort_index(inplace=True)
    self.dataframe.fillna(0, inplace=True)
    self.dataframe = self.dataframe.reindex_axis(sorted(self.dataframe.columns, key=lambda x: self.dataframe[x].sum(), reverse=True), axis=1)


These are nested loops, which you can probably not do too much about due to your algorithm.

However, what I do notice is that you're a lot of duplicate work. You're creating slices of the list, which could be memory expensive. You're using cols.index which does a look-up all the time. You're also computing the color_lab all the time (len(cols)**2/2 times) which is expensive.

def bin_dataframe(self, radius):
    """
    This function looks at the Set's dataframe and checks whether there are
    columns that are closer together than _radius_ in colorspace. Such columns
    are then merged. 
    """
    def mklabcolor(color):
        parts = literal_eval(color)
        rgbcolor = sRGBColor(parts[0], parts[1], parts[2], is_upscaled=True)
        return convert_color(rgbcolor, LabColor)

    cols = [(color, mklabcolor(color) for color in self.dataframe]

    for idx, (col, color_lab) in enumerate(cols):
        for compidx, (compcolor, compcolor_lab) in enumerate(cols[idx+1:], idx+1):
            if delta_e_cie2000(color_lab, compcolor_lab)
                self.dataframe[col].fillna(self.dataframe[compcolor], inplace=True)
                del self.dataframe[compcolor]
                del cols[compidx]

    # Clean up dataframe (sorting columns, setting NaN to 0)
    #self.dataframe.sort_index(inplace=True)
    self.dataframe.fillna(0, inplace=True)
    self.dataframe = self.dataframe.reindex_axis(sorted(self.dataframe.columns, key=lambda x: self.dataframe[x].sum(), reverse=True), axis=1)


I left the last part (regarding the cleanup intact), as that's probably numpy specific, and I don't know enough numpy yet.

Code Snippets

def bin_dataframe(self, radius):
    """
    This function looks at the Set's dataframe and checks whether there are
    columns that are closer together than _radius_ in colorspace. Such columns
    are then merged. 

    The algorithm is similar to the DCB algorithm itself, which is heavily commented
    in the ColorList class.
    """
    cols = list(self.dataframe)

    # Perform checking
    for col in cols:
      colbgr = literal_eval(col)
      color = sRGBColor(colbgr[0], colbgr[1], colbgr[2], is_upscaled=True)
      color_lab = convert_color(color, LabColor)

      for compcol in cols[cols.index(col)+1:]:
        compcolbgr = literal_eval(compcol)
        compcolor = sRGBColor(compcolbgr[0], compcolbgr[1], compcolbgr[2], is_upscaled=True)
        compcolor_lab = convert_color(compcolor, LabColor)
        delta = delta_e_cie2000(color_lab, compcolor_lab)
        if ( delta < radius ):
          self.dataframe[col].fillna(self.dataframe[compcol], inplace=True)
          del self.dataframe[compcol]
          cols.remove(compcol)

    # Clean up dataframe (sorting columns, setting NaN to 0)
    #self.dataframe.sort_index(inplace=True)
    self.dataframe.fillna(0, inplace=True)
    self.dataframe = self.dataframe.reindex_axis(sorted(self.dataframe.columns, key=lambda x: self.dataframe[x].sum(), reverse=True), axis=1)
def bin_dataframe(self, radius):
    """
    This function looks at the Set's dataframe and checks whether there are
    columns that are closer together than _radius_ in colorspace. Such columns
    are then merged. 
    """
    def mklabcolor(color):
        parts = literal_eval(color)
        rgbcolor = sRGBColor(parts[0], parts[1], parts[2], is_upscaled=True)
        return convert_color(rgbcolor, LabColor)

    cols = [(color, mklabcolor(color) for color in self.dataframe]

    for idx, (col, color_lab) in enumerate(cols):
        for compidx, (compcolor, compcolor_lab) in enumerate(cols[idx+1:], idx+1):
            if delta_e_cie2000(color_lab, compcolor_lab)
                self.dataframe[col].fillna(self.dataframe[compcolor], inplace=True)
                del self.dataframe[compcolor]
                del cols[compidx]

    # Clean up dataframe (sorting columns, setting NaN to 0)
    #self.dataframe.sort_index(inplace=True)
    self.dataframe.fillna(0, inplace=True)
    self.dataframe = self.dataframe.reindex_axis(sorted(self.dataframe.columns, key=lambda x: self.dataframe[x].sum(), reverse=True), axis=1)

Context

StackExchange Code Review Q#128493, answer score: 2

Revisions (0)

No revisions yet.