Faster R-CNN vs YOLO vs SSD — Object Detection Algorithms

Overview and comparative study of object detection algorithms

Abonia Sojasingarayar
IBM Data Science in Practice
5 min readAug 29, 2022

--

Object Detection (Source)

Object detection has been evolving rapidly in the field of computer vision. Its involvement in the combination of object classification as well as object localisation makes it one of the most challenging topics in the domain of computer vision. In simple words, the goal of this detection technique is to determine where objects are located in a given image called object localisation and which category each object belongs to, which is called object classification.

In this article we will be discussing the comparative study of three famous object detection algorithms:

  1. Faster R-CNN
  2. YOLO (You Only Look Once)
  3. SSD (Single Shot Detector)

Also, we will see the overview of the current performance comparison of these often used object detection algorithms.

1. Faster R-CNN

The Faster R-CNN model was developed by a group of researchers at Microsoft. Faster R-CNN is a deep convolutional network used for object detection, that appears to the user as a single, end-to-end, unified network. The network can accurately and quickly predict the locations of different objects. In order to truly understand Faster R-CNN, we must also be familiar with the networks that it evolved from, namely R-CNN and Fast R-CNN. Faster R-CNN is an extension of Fast R-CNN. As its name suggests, Faster R-CNN is faster than Fast R-CNN thanks to the region proposal network (RPN).

Faster R-CNN is a single unified model, the architecture is comprised of two modules:

  • RPN (Region Proposal Network) : Convolutional neural network for proposing regions and the type of object to consider in the region.
  • Fast R-CNN : Convolutional neural network for extracting features from the proposed regions and outputting the bounding box and class labels.

Both modules operate on the same output of a deep CNN. The region proposal network acts as an attention mechanism for the Fast R-CNN network, informing the second network of where to look or pay attention.

Faster R-CNN Model Architecture.Taken from: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, 2016.

Faster R-CNN is one of the models that proved that it is possible to solve complex computer vision problems.

New models are currently being built, not only for object detection, but for semantic segmentation, 3D-object detection, and more, that are based on this original model. Some borrow the RPN, some borrow the R-CNN, others just build on top of both.

2. YOLO (You Only Look Once)

It works solely on appearance at the image once to sight multiple objects. Thus, it’s referred to as YOLO, you merely Look Once. By simply gazing at the image once, the detection speed is in period (45 fps). Quick YOLOv1 achieves a hundred and fifty-five FPS. this is often another progressive deep learning object detection approach that has been printed in 2016 CVPR with quite 2000 citations.

Yolo divides the image into a grid. For each grid, some values like class probabilities and the bounding box parameters are calculated. The model works by first splitting the input image into a grid of cells, where each cell is responsible for predicting a bounding box if the center of a bounding box falls within the cell. Each grid cell predicts a bounding box involving the x, y coordinate and the width and height and the confidence. A class prediction is also based on each cell.

Predictions made by YOLO Model. Taken from: You Only Look Once: Unified, Real-Time Object Detection, 2015

The YOLO algorithm is one of the best object detection algorithms because of following reasons:

Speed: This algorithm improves the speed of detection because it can predict objects in real-time.

High accuracy: YOLO is a predictive technique that provides accurate results with minimal background errors.

It has been used in various applications to detect traffic signals, people, parking meters, and animals.

3. SSD

SSD is a single-shot detector. It has no delegated region proposal network and predicts the boundary boxes and the classes directly from feature maps in one single pass.

The SSD object detection is composed of 2 parts:

  • Extract feature maps, and
  • Apply convolution filters to detect objects.
SSD Architecture. Taken From:SSD: Single Shot MultiBox Detector.

Features of SSD as follow:

  • Small convolutional filters to predict object classes and offsets to default boundary boxes.
  • Separate filters for default boxes to handle the difference in aspect ratios.
  • Multi-scale feature maps for object detection.

SSD can be trained end-to-end for better accuracy. SSD makes more predictions and has better coverage on location, scale, and aspect ratios. By removing the delegated region proposal and using lower resolution images, the model can run at real-time speed and still beats the accuracy of the state-of-the-art Faster R-CNN.

Performance comparison

The following is a scatter plot of speed and accuracy of the major object detection methods (R-CNN, Fast R-CNN, Faster R-CNN, YOLO and SSD300), needless to say that the same model setting (VGG16 as the base network, batch size of 1 and tested on Pascal VOC2007 test set) is used for a fair comparison. Note that YOLO and SSD300 are the only single shot detectors, while the others are two stage detectors based on region proposal approach.

SSD vs Faster R-CNN vs YOLO performance comparison (source)

SSD is the only object detector capable of achieving mAP above 70% while being a 46 fps real-time model.

Conclusion

This article has provided an overview of the object detection algorithms and how they are used in object detection.

To summarize:

  • We have gained an overview of object detection and the object detection algorithms.
  • We have gone through the main reasons why each algorithm is important.
  • We have learned how the object detection algorithm works.
  • We have learned the real-life applications of each algorithm.

Also, I haven’t covered some other object detection algorithms such as Fast R-CNN, Histogram of Oriented Gradients (HOG),Region-based Convolutional Neural Networks (R-CNN), Region-based Fully Convolutional Network (R-FCN), Spatial Pyramid Pooling (SPP-net) etc

Connect with me on Linkedin

Find me on Github

Visit my technical channel on Youtube

Support: Buy me a Cofee/Chai

If you find this article helpful, please share your feedback by your claps and comments.

--

--

Abonia Sojasingarayar
IBM Data Science in Practice

Principal Research Scientist | Machine Learning & Ops Engineer | Data Scientist | NLP Engineer | Computer Vision Engineer | AI Analyst