Learning Day 62: Object detection — R-CNN

De Jun Huang
dejunhuang
Published in
2 min readJun 16, 2021

Object detection

  • Detect an object + identify what it is
  • Detect an object — bounding box
  • Identify what the object is —classification
  • Models: R-CNN, SPP-NET, Fast, R-CNN, Faster R-CNN

Object detection with R-CNN

  • Region Based Convolutional Neural Networks (R-CNN)
  • Main idea: modify existing CNN and extract feature maps or fc layer features for object classification and bounding box regression

Region proposals — traditional ways to find bounding box

  • Selective Search (SS) — draw a set of bounding boxes, R, based on certain rules, and combine them based on similarity (in terms of colour, texture and size after combining)

R-CNN steps

  1. Extract region proposal (~2,000) from input image
  2. warp the regions to meet the CNN input size requirement by stretching or rescaling with black boarders
  3. Using AlexNet as an example, extract the fc7 layer for SVM classification
  4. Using the 5th conv layer for bounding box regression

R-CNN fine-tunning details

  • With the R-CNN steps above as the basis
  • Load pre-trained model and train on all region proposals
  • use log loss
  • change softmax to N+1 outputs instead of 1000 in eg. AlexNet
  • Positive label — if IoU with groundtruth ≥ 0.5
  • Negative label — if IoU<0.5
  • IoU is the Intersection over Union
IoU formula in graphical form (left) and examples of IoU values with visualization (right) ref

Details for classifier:

  • Taking AlexNet as the example, train SVM classifier At fc7 layer
  • Each class (total N classes) has 1 SVM classifier (N SVMs)
  • Positive — groundtruth regions
  • Negative — IoU<0.3 or wrong object

Details for regressor:

  • Taking AlexNet as the example, use pre-trained conv5 for bounding box regression
  • Each class 1 regressor

Test metrics:

  • True positive: IoU ≥0.5
  • False positive: IoU<0.5
  • False negative: missed out groundtruth

Drawbacks of R-CNN

  • Long training time
  • Long inference time
  • Take up a lot of storage

Reference

link1

--

--