Introduction to Object Detection

Ola
4 min readOct 8, 2019

--

In this article, we will understand what Object detection is, look at its application in solving a real-life problem in different industries and how it can be implemented using Python.

Object detection is a technology that falls under Computer Vision, it deals with identifying objects present in images and videos. Computer Vision is the science of software systems that can recognize and understand the content of an image. Computer Vision is composed of various aspects like Image generation, Image recognition, Object detection, Video annotation and more. Object detection as a case study is widely used for Face detection, Vehicle detection, Pedestrian counting, Self-driving cars, Security systems. Object detection has brought about the evolution of some high technologies like its application in self-driving cars safely navigation through traffic, sport analyses, proper quality control in manufacturing industries, sports violence detection, facebook face tagging and many more.

Objectives

The two major Objectives of Image Detection include:

  • To identify what all Objects are present in the image
  • Filter out the Object of attention

In any Object detection project, the main aim is to image a certain group of images or images from a scene or picture. The very first thing any Object detection algorithm will do is to first identify the objects present in an image, the user can be based on preference train the algorithm to filter the Object detected. A typical real-life application of this is a project to detect Car plate number in an image provided. The first approach to such a project is detecting various objects in the image, then filter the one we want which a car plate number.

Tools for Object detection

Object detection can be implemented using various FrameWorks and Algorithms.

Approaches that can be used to solve the Object Detection Problem.

Naive way (Divide and Conquer)

The first step in implementing a Naive way is dividing the image into four parts:

  • Upper left-hand side corner
  • Upper right-hand side corner
  • Lower left-hand side corner
  • Lower right-hand side corner

Then fit the images into a classifier to classify whether the object we are trying to detect is in each of the patches or not. Once we have an output, the image will then be put together and the patch that contains the image we are trying to detect will be marked as shown below.

This method is not encouraged because it is very slow and it doesn’t recognize the precise object.

Increase the number of divisions

This works the Naive way does, the only difference here is that it divides the image into more patches. The shortcomings of this approach are that it is not organized, it is hard to implement, recognizing different Objects across each patch can lead to disaster.

Deep Learning

Deep Learning plays an import role in building an image detection system, this provides highly accurate Object detection algorithms. These algorithms require lots of mathematical proving and deep knowledge of neural networks. Deep Learning approach doesn’t require manual patching of images. Various Deep Learning algorithms used in Object detection include:

  • Region-based Convolutional Network (R-CNN) and its variants
  • Single Shot Detector (SSDs)
  • YOLO(You Only Look Once)
  • ImageAI

R-CNN

Region-based Convolutional Network method (R-CNN) achieves excellent Object detection accuracy by using a deep ConvNet to classify Object.

Major demerits of Region-based Convolutional Network include:

1. Training is a multi-stage pipeline, R-CNN’s training does not execute a one-time program. It creates a step by step pipeline where the Convolution network layer is fine-tuned with log loss as the loss function, then it fits the model into the Support Vector Machine(SVM) Algorithm. Support Vector Machine(SVM) Algorithm uses SVMsact as its activation function, softmax learned during fine-tuning is replaced by SVMsact during training with SVM. In the third stage, bounding-box regressors are learned.

2. Training is expensive in space and time, SVM and bounding-box regressor training, features are extracted from each Object proposal in each image and written to disk.

3. Object detection is slow, At test-time, features are extracted from each object proposal in each test image.Detection with VGG16 takes 47s / image (on a GPU).

YOLO(You Only Look Once)

YOLO is a new approach to Object detection. Previous work on Object detection involves creating several networks in the training pipeline. In YOLO, the entire image is in an instance, a single neural network predicts the bounding box coordinate and calculates class probabilities for the box. YOLO is one of the best algorithms for Object detection and it is very fast. This can be optimized end-to-end directly on detection performance. YOLO is a fast accurate Object detector and ideal for Object detection project.

YOLO imposes strong spatial constraints on bounding box predictions since each grid cell only predicts two boxes and can only have one class. YOLO struggles with small Objects that appear in groups, such as flocks of birds which is a limitation.

ImageAI

ImageAI is a python library built to empower developers to build applications and systems with self-contained Deep Learning and Computer Vision capabilities using simple and few lines of code. ImageAI allows you to perform all of these with state-of-the-art deep learning algorithms like RetinaNet, YOLOv3, and TinyYOLOv3. With ImageAI you can run detection tasks and analyze images.

This Object Detection class provides you function to perform Object detection on any image or set of images, using pre-trained models that were trained on the COCO dataset. This means you can detect and recognize 80 different kinds of common everyday objects. To get started, download any of the pre-trained models that you want to use through the links below.

Conclusion

Now we have treated the fundamentals of Object Detection, various approaches that can be used to solve Object Detection Problems. In the subsequent part of the series, I’ll be treating its implementation using Python.

References

Thanks for Reading.

Cheers!

--

--