Object Detection | Sliding Window | R-CNN | Fast R-CNN | Faster R-CNN

Object Detection Introduction

Rayan Ali
2 min readAug 12, 2023

Summary

Object detection is a fundamental computer vision technique used to identify and locate objects within images or videos. Early object detection algorithms, such as Sliding Window, R-CNN, Fast R-CNN, and Faster R-CNN, have paved the way for more advanced approaches. In this summary, we’ll explore the key characteristics and limitations of these techniques.

Sliding Window

Process

The Sliding Window approach involves systematically moving a fixed-size window across an image, classifying the object within the window using a classification network.

Drawbacks

A significant limitation arises when the object’s size differs from the window size, leading to inaccurate detection or classification.

Solution

Generating multiple windows of varying sizes and utilizing considerable computational power to improve accuracy. However, there’s still a chance of missed detections.

R-CNN (Region-CNN):

Process

R-CNN introduces the Selective Search algorithm to propose regions of interest for classification. These regions, known as proposals, are processed by a Convolutional Neural Network (CNN) for feature extraction and then classified using a Support Vector Machine (SVM). Bounding box offset values are also generated.

Drawbacks

High execution time due to 2000 proposals per image, making it unsuitable for real-time inference. Training time is also substantial.

Solution

Reducing training time and enhancing efficiency in feature extraction.

Fast R-CNN

Process

Fast R-CNN streamlines the process by using the entire image for CNN-based feature extraction. Regions are generated from the feature map and sent to an ROI Pooling Layer, followed by a fully connected layer for classification. Softmax layer predicts classes and bounding box offsets.

Drawbacks

Reliance on external methods (e.g., Selective Search) and a sequential process lead to inefficiencies.

Solution

Developing an optimized proposal generation algorithm and parallelizing the process.

Faster R-CNN

Process

Faster R-CNN builds on Fast R-CNN by replacing the external method with a Region Proposal Network (RPN) for parallel proposal generation. The RPN-generated proposals are processed through an ROI Pooling Layer and a neural network for classification, yielding class predictions and bounding box offsets.

Drawbacks

Although improved, some overhead remains due to proposal generation. The architecture may require more memory due to multi-stage processing.

Solution

Recent advances in object detection, such as SSD, Mask R-CNN, and YOLO, have further improved detection accuracy and efficiency.

In recent years, object detection techniques have seen significant advancements with algorithms like SSD, Mask R-CNN, and YOLO, greatly enhancing the accuracy and speed of object detection.

--

--