patterncppCritical

Image Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition

Submitted by: @import:stackoverflow-api·Mar 10, 2026·

Viewed 0 times

cocaalgorithmimprovementforimageprocessingcanrecognitioncola

Problem

One of the most interesting projects I've worked on in the past couple of years was a project about image processing. The goal was to develop a system to be able to recognize Coca-Cola 'cans' (note that I'm stressing the word 'cans', you'll see why in a minute). You can see a sample below, with the can recognized in the green rectangle with scale and rotation.

Some constraints on the project:

The background could be very noisy.

The can could have any scale or rotation or even orientation (within reasonable limits).

The image could have some degree of fuzziness (contours might not be entirely straight).

There could be Coca-Cola bottles in the image, and the algorithm should only detect the can!

The brightness of the image could vary a lot (so you can't rely "too much" on color detection).

The can could be partly hidden on the sides or the middle and possibly partly hidden behind a bottle.

There could be no can at all in the image, in which case you had to find nothing and write a message saying so.

So you could end up with tricky things like this (which in this case had my algorithm totally fail):

I did this project a while ago, and had a lot of fun doing it, and I had a decent implementation. Here are some details about my implementation:

Language: Done in C++ using OpenCV library.

Pre-processing: For the image pre-processing, i.e. transforming the image into a more raw form to give to the algorithm, I used 2 methods:

Changing color domain from RGB to HSV and filtering based on "red" hue, saturation above a certain threshold to avoid orange-like colors, and filtering of low value to avoid dark tones. The end result was a binary black and white image, where all white pixels would represent the pixels that match this threshold. Obviously there is still a lot of crap in the image, but this reduces the number of dimensions you have to work with.

Noise filtering using median filtering (taking the median pixel value of all neighbors and rep

Solution

An alternative approach would be to extract features (keypoints) using the scale-invariant feature transform (SIFT) or Speeded Up Robust Features (SURF).

You can find a nice OpenCV code example in Java, C++, and Python on this page: Features2D + Homography to find a known object

Both algorithms are invariant to scaling and rotation. Since they work with features, you can also handle occlusion (as long as enough keypoints are visible).

Image source: tutorial example

The processing takes a few hundred ms for SIFT, SURF is bit faster, but it not suitable for real-time applications. ORB uses FAST which is weaker regarding rotation invariance.
The original papers

SURF: Speeded Up Robust Features

Distinctive Image Features

from Scale-Invariant Keypoints

ORB: an efficient alternative to SIFT or SURF

Context

Stack Overflow Q#10168686, score: 788

Revisions (0)

No revisions yet.