Learning Day 62: Object detection — R-CNN

Published in

dejunhuang

2 min readJun 16, 2021

--

Object detection

Detect an object + identify what it is
Detect an object — bounding box
Identify what the object is —classification
Models: R-CNN, SPP-NET, Fast, R-CNN, Faster R-CNN

Object detection with R-CNN

Region Based Convolutional Neural Networks (R-CNN)
Main idea: modify existing CNN and extract feature maps or fc layer features for object classification and bounding box regression

Region proposals — traditional ways to find bounding box

Selective Search (SS) — draw a set of bounding boxes, R, based on certain rules, and combine them based on similarity (in terms of colour, texture and size after combining)

R-CNN steps

Extract region proposal (~2,000) from input image
warp the regions to meet the CNN input size requirement by stretching or rescaling with black boarders
Using AlexNet as an example, extract the fc7 layer for SVM classification
Using the 5th conv layer for bounding box regression

R-CNN fine-tunning details

With the R-CNN steps above as the basis
Load pre-trained model and train on all region proposals
use log loss
change softmax to N+1 outputs instead of 1000 in eg. AlexNet
Positive label — if IoU with groundtruth ≥ 0.5
Negative label — if IoU<0.5
IoU is the Intersection over Union

IoU formula in graphical form (left) and examples of IoU values with visualization (right) ref

Details for classifier:

Taking AlexNet as the example, train SVM classifier At fc7 layer
Each class (total N classes) has 1 SVM classifier (N SVMs)
Positive — groundtruth regions
Negative — IoU<0.3 or wrong object

Details for regressor:

Taking AlexNet as the example, use pre-trained conv5 for bounding box regression
Each class 1 regressor

Test metrics:

True positive: IoU ≥0.5
False positive: IoU<0.5
False negative: missed out groundtruth

Drawbacks of R-CNN

Long training time
Long inference time
Take up a lot of storage

Reference

Machine Learning

De Jun Huang

Written by De Jun Huang

Editor for

dejunhuang

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams