HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMinor

How to choose the number of HOG pyramid layers?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
numberthehogchoosepyramidlayershow

Problem

Reading Histograms of Oriented Gradients for Human Detection, I cannot see any discussion of how many layers there should be in the HOG pyramid. OpenCV offers a scaling factor of 1.05 which would mean 15 layers between each halving and Object Detection with Discriminatively Trained
Part Based Model suggests 10 layers, but that is for their specific algorithm, I believe.

However, with 1280 x 960 source images these values become prohibitively expensive regarding performance. How does decreasing the number of layers affect detection rates? What exactly am I trading off?

Solution

That's a good question!

All of this will depend in your application and camera setup. Usually, it works like this: if you have a say 128x128 detector template you will be able to detect the object at this size and greater dimensions. When you go up at the pyramid the image size is smaller, but the detected object (w.r.t the original image) is bigger).

One of the main ways to reduce computational time is to reduce the number of pyramid levels. With all things kept unchanged, this will make you not detect very large objects. By the way, a common trick in object detection is to resize the image such that the object in the scene appear at the right scale w.r.t the pyramid.

If, you change the pyramid step, some object could be missed if their scale does not match any of the pyramid levels. In summary, as I said, it will depend on the setup (e.g. if you have a camera send people from a distance, usually you will need fewer leves of the pyramid).

I recently wrote a paper on using the camera calibration to estimate the best scales for people detection in a surveillance scenario. It could help you:

Fuhr, G.; Jung, C.R., "Camera Self-calibration Based on Non-Linear Optimization and Applications in Surveillance Systems," in Circuits and Systems for Video Technology, IEEE Transactions on , vol.PP, no.99, pp.1-1

Context

StackExchange Computer Science Q#51746, answer score: 2

Revisions (0)

No revisions yet.