Understanding Objectness in Object Detection Models

Nathan Zhao
4 min readAug 14, 2020

--

In this article, we go over the basics of Object Detection Models and Objectness.

What are Object Detection Models?

Object detection models essentially, as given by their name, detect objects. This means that given an image, it can point to you where an object is, and what the object is. For example, in the above image, we have a multitude of objects and using an object detection model, we have detected where different objects are placed within an image.

There are many applications of these kinds of models. To name a few, object detection can be useful in:

  • Self-driving cars such that they can detect passengers, other cars, traffic lights, and stop signs.
  • Security, where a model may detect the presence of a gun or bomb in a public area and warn nearby officials.

In general, these kinds of models are very useful and have received large amounts of research from the machine learning community in the past few years.

An Introduction to Object Detection Region Proposals

For starters, let us understand how object detection models function. First, we have to propose the location of an object. We call this proposed location our region of interest, often shown in a bounding box (also known as an image window). Depending on the type of Object Detection Model, we can do this in many different ways.

  • Naive Method: We split our image into multiple parts and classify each part. Inefficient because a classification network (CNN) would have to be applied to each generated window, leading to high computation time.
  • Sliding-Window Method: We have predetermined window proportions (or “anchors”) that we slide across our image. For each window, we process it and continue to slide. Similar to the Naive Method, this method leads to many generated windows, which leads to high processing time.
  • Selective Search: Using color similarity, texture similarity, and a few other image details, we can algorithmically separate the image into regions. This leads to less required applications of a classification network, though the selective search algorithm itself is time-consuming.
  • Region Proposal Network: We create a separate network for determining the regions of interest in an image. This makes our model work faster, but also makes it such that the accuracy of our final model relies on multiple networks.

There are a few differences between these different options listed above, but in general, as we speed up our network’s processing time, we tend to sacrifice our model’s accuracy.

The primary issue in region proposal mechanisms is that if the proposed region does not contain an object, your classification network has simply processed an image window and faultily labeled it for no reason.

So… What is Objectness?

Objectness is essentially a measure of the probability that an object exists in a proposed region of interest. If we have high objectness, this means that the image window likely contains an object. This allows us to quickly prune out proposed image windows that do not contain any objects.

If an image has high objectness, we expect it to have:

  • Uniqueness in the whole image
  • Tight boundaries around the object
  • Different appearance to its surroundings

For example, in the image above, we would expect low objectness for the red boxes, medium objectness for the blue box, and high objectness for the green boxes. This is because the green boxes “tightly” fit around our objects, whereas the blue box fits loosely, and the red boxes do not contain any object at all.

How do we Measure Objectness?

There is a multitude of parameters affecting the objectness of an image window.

  • Multi-scale Saliency: This is essentially the measure of the unique appearance of an image window. The higher the density of unique pixels in a bounding box compared to the whole image, the higher this value will be.
  • Color Contrast: The more contrast in color between the pixels within the bounding box and the region around the proposed image window, the greater this value will be.
Superpixel regions shown in different colors. Notice how the superpixels within the box mostly don’t leak outside of the image window. Thus this “superpixel straddling” value will be high.
  • Edge Density: Where we define edges as the object’s boundary, this value is the measure of edges near the boundaries of the image window. An interesting algorithm for finding these edges can be found here.
  • Superpixel Straddling: We define superpixels to be blobs of pixels with nearly the same color. If this value is high, all superpixels within a box are only contained within its boundaries. Feel free to refer to the image on the left.

With higher values of the above parameters, we will have a higher objectness value. Try to relate the above parameters to our previously listed expectations of images with high objectness.

Conclusion

In general, objectness measures the probability of an object existing within an image, which allows for the pruning of proposed regions of interest not containing any objects.

If you are interested in learning more about object detection and objectness, there are a few very useful resources listed below.

References

  1. http://calvin-vision.net/wp-content/uploads/Publications/alexe12pami.pdf
  2. https://www.analyticsvidhya.com/blog/2018/10/a-step-by-step-introduction-to-the-basic-object-detection-algorithms-part-1/
  3. https://medium.com/r?url=https%3A%2F%2Fcv-tricks.com%2Fopencv-dnn%2Fedge-detection-hed%2F
  4. https://machinelearningmastery.com/object-recognition-with-deep-learning/

--

--