Review: R-CNN (Object Detection)

Sik-Ho Tsang
Aug 31, 2018 · 4 min read

Region-CNN (R-CNN) [1] is one of the state-of-the-art CNN-based deep learning object detection approaches. Based on this, there are fast R-CNN and faster R-CNN for faster speed object detection as well as mask R-CNN for object instance segmentation. On the other hand, there are also other object detection approaches, such as YOLO and SSD.

To know deep learning object detection approach well, R-CNN is a must read item. And it is a 2014 CVPR paper with about 6000 citations at the moment I was writing this story. (Sik-Ho Tsang @ Medium)

To have object detection, we need to know the class of object and also the bounding box size and location.

Conventionally, for each image, there is a sliding window to search every position within the image as below. It is a simple solution. However, different objects or even same kind of objects can have different aspect ratios and sizes depending on the object size and distance from the camera. And different image sizes also affect the effective window size. This process will be extremely slow if we use deep learning CNN for image classification at each location.

Image for post
Image for post
Illustration of Sliding Window (Left) with Different Aspect Ratios and Sizes (Right)
  1. First, R-CNN uses selective search by [2] to generate about 2K region proposals, i.e. bounding boxes for image classification.
  2. Then, for each bounding box, image classification is done through CNN.
  3. Finally, each bounding box can be refined using regression.
Image for post
Image for post
R-CNN Flowchart

What will be covered:

  1. Selective Search
  2. CNN-based Classification and Scoring
  3. Results

1. Selective Search

Image for post
Image for post
Selective Search

Selective search is proposed by [2].

  1. First, color similarities, texture similarities, region size, and region filling are used as non-object-based segmentation. Therefore we obtain many small segmented areas as shown at the bottom left of the image above.
  2. Then, bottom-up approach is used that small segmented areas are merged together to form larger segmented areas.
  3. Thus, about 2K region proposals (bounding box candidates) are generated as shown in the image.

2. CNN-based Classification and Scoring

Image for post
Image for post
R-CNN Flowchart with More Details
Image for post
Image for post
Original AlexNet

AlexNet [3] is used to extract the CNN features.

For each proposal, a 4096-dimensional feature vector is computed by forward propagating a mean-subtracted 227×227 RGB image through five convolutional layers and two fully connected layers.

The input has the fixed size of 227×227 while bounding boxes have various shapes and sizes. So, all pixels in a tight bounding box are warped to 227×227 size.

The feature vector is scored by SVM trained for each class.

For each class, High IoU (Intersection over Union) overlapping bounding boxes are rejected since they are bounding the same object.

The predicted bounding box can be further fine-tuned by another bounding box regressor.

3. Results

3.1 VOC 2010

Image for post
Image for post
VOC 2010

R-CNN and R-CNN BB obtain the highest mAP (mean average prediction).

3.2 ILSVRC 2013

Image for post
Image for post
Some Amazing ILSVRC 2013 Results
Image for post
Image for post
Some ILSVRC 2013 Results with Some Missing Detections
Image for post
Image for post
ILSVRC 2013

R-CNN BB even outperforms OverFeat [4], which is the winner of ILSVRC 2013 localization task!

3.3 VOC 2007

Image for post
Image for post
Some examples with high activations in VOC 2007
Image for post
Image for post
VOC 2007

As you may already know, the CNN used in R-CNN can be changed to any CNNs used in image classification.

When R-CNN BB uses VGG-16 [5] which is a 16-layer VGGNet, mAP is even increased to 66.0%.

If interested, please read also my reviews about AlexNet, VGGNet, and OverFeat. (Links at the bottom)

And I will write more reviews for other state-of-the-art deep learning approaches.

Coinmonks

Coinmonks is a non-profit Crypto educational publication.

Sign up for Coinmonks

By Coinmonks

A newsletter that brings you week's best crypto and blockchain stories and trending news directly in your inbox, by CoinCodeCap.com Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Sik-Ho Tsang

Written by

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn: https://www.linkedin.com/in/sh-tsang/, My Paper Reading List: https://bit.ly/33TDhxG

Coinmonks

Coinmonks

Coinmonks is a non-profit Crypto educational publication. Follow us on Twitter @coinmonks Our other project — https://coincodecap.com

Sik-Ho Tsang

Written by

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn: https://www.linkedin.com/in/sh-tsang/, My Paper Reading List: https://bit.ly/33TDhxG

Coinmonks

Coinmonks

Coinmonks is a non-profit Crypto educational publication. Follow us on Twitter @coinmonks Our other project — https://coincodecap.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store