Object Detection in Deep Learning (part1)

Amin Ag
AI³ | Theory, Practice, Business
3 min readAug 25, 2019

There are many interesting problems in the computer vision domain. The one which we are going to focus on is the Localization and Detection problem; also referred to as Object detection.

“…we will be using the term object recognition broadly to encompass both image classification (a task requiring an algorithm to determine what object classes are present in the image) as well as object detection (a task requiring an algorithm to localize all objects present in the image “ INLSVRC 2015

Single and group object detection

Object Detection is used almost everywhere these days. The use cases are endless, be it Tracking objects, Video surveillance, Pedestrian detection, Anomaly detection, People Counting, Self-driving cars or Face detection, the list goes on.

Object detection is more challenging and combines these two tasks:

1-Image classification involves assigning a class label to an image : Given an image can you find out the class the image belongs to. We can solve any new image classification problem with ConvNets (or any similar model) and Transfer learning using pre-trained nets.

2-Object localization involves drawing a bounding box around one or more objects in an image.

  • Input: An image with one or more objects, such as a photograph.
  • Output: One or more bounding boxes (e.g. defined by a point, width, and height), and a class label for each bounding box.

We are just adding one more output layer to the convnets that are already in place for the sole purpose of predicting the coordinates of the bounding box and tweaking our loss function.

In such a set up the loss is a weighted sum of the Softmax Loss(from the Classification Problem) and the regression L2 loss(from the bounding box coordinates).

Since these two losses would be on a different scale, the alpha hyper-parameter needs to be tuned.

The performance of a model for image classification is evaluated using the mean classification error across the predicted class labels. The performance of a model for single-object localization is evaluated using the distance between the expected and predicted bounding box for the expected class. Whereas the performance of a model for object recognition is evaluated using the precision and recall across each of the best matching bounding boxes for the known objects in the image.

Now that we are familiar with the problem of object localization and detection, let’s take a look at some recent top-performing deep learning models. (Next posts part2: R-CNN & Yolo model families)

--

--