Object Detection Algorithms-R CNN vs Fast-R CNN vs Faster-R CNN

Published in

Analytics Vidhya

4 min readJul 1, 2020

This article will describe the comparison between various R-CNN algorithms used for any object detection problem. It is assumed that the reader of this article has the prerequisite knowledge of basic CNN model, which is generally used for image classification.

Difference between Image Classification and Object Detection

Let me give you a simple example. We have 5000 labelled images of burgers and 5000 labelled images of pizzas. We split the dataset into training (80%) and testing (20%) sets. Our task is to classify the unlabeled test set images as Burgers and Pizzas.

A basic CNN can suffice the need here by classifying a singular burger image as Burger and a singular pizza image as Pizza.

But what if I have scenarios where some images contain both burgers and pizzas ? Or may be other food items as well such as pancakes, noodles etc. ?

An image comprising both burger and pizza together

To solve such kind of drawbacks, we opt for Object Detection where the model draws a boundary box around various objects across all the images.

Let us now go through some popular object detection algorithms.

R-CNN

R CNN uses selective search algorithm to extract the top 2000 region proposals among millions of regions of interest (ROI) proposals from an image and feed it to a CNN model.

How does Selective Search Algorithm work ?

Let’s come back to our R-CNN Procedure.

Drawbacks of using R-CNN

It uses the Selective Search Algorithm to find the Regions of Interest which is a slow and time consuming process.
The above process shown, is for only one image. Now, if in the dataset I have 3000 training images, the entire process will run 3000 times. So, you can imagine how much longer time it takes to train the model. Therefore, the number of CNNs used here is equal to the number of ROI proposals per image.Consequently, the total number of features for CNN, for 3000 images will become 3000 x 2000 = 6,000,000.

For this reason, Fast-R CNN was developed in order to overcome this bottleneck of R CNN.

Fast R-CNN

In Fast R-CNN, the original image is passed directly to a CNN, which generates a feature map.
That feature map contains various ROI proposals, from which we do warping or ROI pooling on the extracted regions of interest to make sure all the regions are of the same size.
ROI Pooling works by extracting a fixed size window from the feature map and using the features to predict the final class label and bounding box.

Review: Fast R-CNN (Object Detection) – mc.ai — ROI Pooling

Finally, these regions are passed on to a fully connected network (containing one or more Fully Connected Layers) which classifies them, as well as returns the bounding boxes using softmax and linear regression layers simultaneously

Therefore, the Fast R-CNN has shown the following advantages:

Reduced the total number of initial features for CNN, from 6,000,000 to only 3000.
Instead of 2000 SVMs, we are classifying using Softmax functions of quantities equivalent to the number of classes (3 in this case). Softmax generally performs better than SVMs.

Drawbacks of using Fast R-CNN

It still uses the Selective Search Algorithm which is slow and a time-consuming process.
It takes around 2 seconds per image to detect objects, which sometimes does not work properly with large real-life datasets.

Faster R-CNN

Instead of Selective Search algorithm, it uses RPN (Region Proposal Network) to select the best ROIs automatically to be passed for ROI Pooling.

That’s it for this article. In my next article, I would demonstrate the implementation of the above algorithms using Python Programs and also discuss a more advanced technique known as the YOLO Object Detection.