Learning Day 62: Object detection — R-CNN
Published in
2 min readJun 16, 2021
Object detection
- Detect an object + identify what it is
- Detect an object — bounding box
- Identify what the object is —classification
- Models: R-CNN, SPP-NET, Fast, R-CNN, Faster R-CNN
Object detection with R-CNN
- Region Based Convolutional Neural Networks (R-CNN)
- Main idea: modify existing CNN and extract feature maps or fc layer features for object classification and bounding box regression
Region proposals — traditional ways to find bounding box
- Selective Search (SS) — draw a set of bounding boxes, R, based on certain rules, and combine them based on similarity (in terms of colour, texture and size after combining)
R-CNN steps
- Extract region proposal (~2,000) from input image
- warp the regions to meet the CNN input size requirement by stretching or rescaling with black boarders
- Using AlexNet as an example, extract the fc7 layer for SVM classification
- Using the 5th conv layer for bounding box regression
R-CNN fine-tunning details
- With the R-CNN steps above as the basis
- Load pre-trained model and train on all region proposals
- use log loss
- change softmax to N+1 outputs instead of 1000 in eg. AlexNet
- Positive label — if IoU with groundtruth ≥ 0.5
- Negative label — if IoU<0.5
- IoU is the Intersection over Union
Details for classifier:
- Taking AlexNet as the example, train SVM classifier At fc7 layer
- Each class (total N classes) has 1 SVM classifier (N SVMs)
- Positive — groundtruth regions
- Negative — IoU<0.3 or wrong object
Details for regressor:
- Taking AlexNet as the example, use pre-trained conv5 for bounding box regression
- Each class 1 regressor
Test metrics:
- True positive: IoU ≥0.5
- False positive: IoU<0.5
- False negative: missed out groundtruth
Drawbacks of R-CNN
- Long training time
- Long inference time
- Take up a lot of storage