HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppCritical

Image Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition

Submitted by: @import:stackoverflow-api··
0
Viewed 0 times
cocaalgorithmimprovementforimageprocessingcanrecognitioncola

Problem

One of the most interesting projects I've worked on in the past couple of years was a project about image processing. The goal was to develop a system to be able to recognize Coca-Cola 'cans' (note that I'm stressing the word 'cans', you'll see why in a minute). You can see a sample below, with the can recognized in the green rectangle with scale and rotation.

Some constraints on the project:

  • The background could be very noisy.



  • The can could have any scale or rotation or even orientation (within reasonable limits).



  • The image could have some degree of fuzziness (contours might not be entirely straight).



  • There could be Coca-Cola bottles in the image, and the algorithm should only detect the can!



  • The brightness of the image could vary a lot (so you can't rely "too much" on color detection).



  • The can could be partly hidden on the sides or the middle and possibly partly hidden behind a bottle.



  • There could be no can at all in the image, in which case you had to find nothing and write a message saying so.



So you could end up with tricky things like this (which in this case had my algorithm totally fail):

I did this project a while ago, and had a lot of fun doing it, and I had a decent implementation. Here are some details about my implementation:

Language: Done in C++ using OpenCV library.

Pre-processing: For the image pre-processing, i.e. transforming the image into a more raw form to give to the algorithm, I used 2 methods:

  • Changing color domain from RGB to HSV and filtering based on "red" hue, saturation above a certain threshold to avoid orange-like colors, and filtering of low value to avoid dark tones. The end result was a binary black and white image, where all white pixels would represent the pixels that match this threshold. Obviously there is still a lot of crap in the image, but this reduces the number of dimensions you have to work with.



  • Noise filtering using median filtering (taking the median pixel value of all neighbors and rep

Solution

An alternative approach would be to extract features (keypoints) using the scale-invariant feature transform (SIFT) or Speeded Up Robust Features (SURF).

You can find a nice OpenCV code example in Java, C++, and Python on this page: Features2D + Homography to find a known object

Both algorithms are invariant to scaling and rotation. Since they work with features, you can also handle occlusion (as long as enough keypoints are visible).

Image source: tutorial example

The processing takes a few hundred ms for SIFT, SURF is bit faster, but it not suitable for real-time applications. ORB uses FAST which is weaker regarding rotation invariance.
The original papers

  • SURF: Speeded Up Robust Features



  • Distinctive Image Features


from Scale-Invariant Keypoints

  • ORB: an efficient alternative to SIFT or SURF

Context

Stack Overflow Q#10168686, score: 788

Revisions (0)

No revisions yet.