What is Object Detection?

Ashish Patel
ML Research Lab
Published in
5 min readJun 11, 2020

Computer Vision Object detection Series…!!!

Credit : CCO Public Domain

Over the past decade, Deep learning has drawn much greater attention and become imperious technology in the Artificial intelligence area. Object detection is considered one of the noteworthy areas in the deep learning and Computer vision. Object detection has been determined the numerous applications in computer vision such as object tracking, retrieval, video surveillance, image captioning, Image segmentation, Medical Imagine and several greater number other applications as well. In this article, we are going to be understanding all the fundamental things about object detection. So, Let’s get started.

1. What is Object Detection?

  • Object Detection is a technology of deep learning, where things, human, building, cars can be detected as object in image and videos.
Fig 2. Classification, Object Detection and Segmentation Representation
  • Object detection is merely to recognize the object with bounding box in the image, where in image classification, we can simply categorize(classify) that is an object in the image or not in terms of the likelihood(Probability).
  • Note: SoftMax function helps us to identify
  • In the figure 1, you will be able to view that kitten with bounding box and without bounding box can distinguish the fundamental difference between Image classification and Object detection.

2. Machine learning and Deep learning base object Detection

  • Machine learning based object detection method has to extract the feature manually by using the Image based feature extraction technique such as Histogram of oriented gradients(HOG), Speeded-up robust features(SURF), Local binary patterns (LBP), Haar wavelets, Color histograms etc.
Fig 3. Machine learning and deep learning base Object Detection Difference
  • Deep learning based object detection method has a ability to extract the feature automatically with help of deep learning algorithm such as Convolution Neural Network, Auto-Encoder, Variance Auto-Encoder etc. which is generate the feature from the image like edge, shape etc.

3. How it works?

Before Explore about the object detection, we need to know the image classification. Image Classification is provided with a lot of material to study, and you may have implemented it all through a tutorial at least once. When an image is an input to the CNN, the problem of classifying the class corresponding to the image is known as image classification, and as shown in the figure below, probability values ​​for all targeted classes are output.

Fig 4. Image Classification credit : https://hoya012.github.io/

You can think of Object Detection as a problem in which an image classification task has a regression task that predicts the position of an object using a bounding box.

Fig 5. Object Detection with Bounding Box credit : https://hoya012.github.io/

Problem of Object detection has assumed that multiple classes of objects may exist in a an image at same time.

  • We can also visualize this like two types of problem one is multi label classification(multiple class in one image).
  • Bounding Box(Regression Problem) in which we have to predict the coordinates values of the bounding box in terms of x,y,w,h.

The example in Figure 5 shows the case where one object exists in one image, but it should be possible to detect even if multiple objects exist in one image as shown in Figure 1.

4. Object Localization

  • The task of object localization is to predict the object in an image as well as its boundaries. The difference between object localization and object detection is subtle. Simply, object localization aims to locate the main (or most visible) object in an image while object detection tries to find out all the objects and their boundaries.
  • An image classification or image recognition model simply detect the probability of an object in an image. In contrast to this, object localization refers to identifying the location of an object in the image. An object localization algorithm will output the coordinates of the location of an object with respect to the image. In computer vision, the most popular way to localize an object in an image is to represent its location with the help of bounding boxes.

A Bounding Box can be initialized using the following parameters:

  • bx, by : coordinates of the center of the bounding box
  • bw : width of the bounding box w.r.t the image width
  • bh : height of the bounding box w.r.t the image height

By predicting this they are calculating Mean-IOU and predict the bounding box which is do localize the object in Image.

  • IoU : The Intersection-Over-Union (IoU), also called as the Jaccard Index, is considered to be one of the most widely used performance metrics in Object Detection
  • IoU is the area of overlap between both the predicted segmentation and the ground truth divided by the area of union between the predicted segmentation and the ground truth, as displayed on the image to the left. This metric varies from 0–1 (0–100%) with 0 implying no overlap (garbage) and 1 signifying perfectly overlapping segmentation (fat dub).
  • Mean IoU : Binary (two classes) or multi-class segmentation, the mean IoU of the image is calculated by taking the IoU of each class and averaging them.

Now you can understand overall game of Object Detection. In the next article I will explain series of Object detection algorithm which is trendy in the market.

Thanks for Reading..!!! If you liked this article do clap…and Encourage to write about more…!!!

References:

  1. https://d2l.ai/chapter_computer-vision/bounding-box.html#bounding-box
  2. https://lilianweng.github.io/lil-log/2017/10/29/object-recognition-for-dummies-part-1.html

--

--

Ashish Patel
ML Research Lab

LLM Expert | Data Scientist | Kaggle Kernel Master | Deep learning Researcher