Learning Day 64: Object detection 3 — Fast R-CNN and Faster R-CNN
Published in
3 min readJun 18, 2021
Fast R-CNN
Advantages
- Better performance
- Faster than R-CNN and SPP-Net (training: 8.8x faster than R-CNN, single image test: 146x faster at 0.32s)
- End to end object detection
- All layers can be fine tuned
New techniques
1. RoI pooling for Selective Search
- It is a special case of SPP pooling
- At SPP pooling, there are various grid sizes for the same region
- At ROI pooling, it only uses the finest grid sizes. (eg. 7x7 for VGG)
- In each grid, perform max pooling
- Finding a function f to establish bounding box as close to groundtruth box as possible. It does translation first, scaling second
2. Multi-task loss
- Combine classification and regression losses to one
Training/Fine tuning procedure
Mini batch sampling
- Batch size (128) = images in batch (2) x RoI in each image (64)
- RoI grouping based on overlapping with groundtruth with following rules
- (1) 25% objects with IoU ≥0.5
- (2) 75% background with IoU=[0.1, 0.5)
Other details
- Due to the large amount of RoIs, almost of of the time is used for FC layers calculations. Can be accelerated by using SVD
Faster R-CNN
- Faster R-CNN = Fast R-CNN + RPN (Region Proposal Network)
- Even faster (single image test: 0.198s)
- Replace the last bit of non NN component, Selective Search, with NN structure RPN
Region Proposal Network (RPN)
Advantages
- Enable weight sharing for conv layers
- No more offline Selective Search
- Less region proposals but higher quality
How it works
- Taking conv 5 feature map from the earlier conv layers
- In the sliding window, take k anchor boxes of various sizes
- Use 3x3 conv layer to get 256-d layer
- Use 1x1 conv layer to get 4k-d layer for regression
- Use another 1x1 conv layer to get 2k-d layer for classification
- For anchor box, eg. k=9 →3 scales (128, 256, 512) with 3 ratios (1:1, 1:2, 2:1)
RPN loss
- Lcls for object or non object
- Lreg uses smooth L1
- mini-batch sampling:
- — single image
- — 128 positive samples. IoU > 0.7 anchor boxes or the largest
- — 128 negative samples. IoU< 0.3 anchor boxes
Training procedure for Faster R-CNN
Step 1. Train RPN
- Initialise conv layers with pretrained weights from ImageNet
- Generate region proposals and pass to R-CNN
Step 2. Train fast R-CNN
- Initialise conv layers with pretrained weights from ImageNet
(notice the above two blocks of conv layers are different and not shared)
Step 3. Fine tune RPN
- Initialise conv layers with Fast R-CNN weights
- Fix conv layer, fine tune the rest
- Generate better region proposals and pass to R-CNN
Step 4. Find tune Fast R-CNN
- Fix conv layers, fine tune the rest
(Notice that the conv layers in step 3 and 4 are shared, those were the resultant layers trained in step 2)