Learning Day 63: Object detection 2— SPP-Net

Published in

dejunhuang

2 min readJun 17, 2021

--

SPP-Net

Improve on the drawbacks of R-CNN: slow, many calculations (all region proposals (~2000) in each image will go through CNN). Ie. Image → crop/warp →conv layers →fc layers →output
Directly feed the entire image to CNN for once and extract region features at Conv5
Spatial Pyramid Pooling (SPP) to extract features at different size of regions
SPP-Net flow: Image →conv layers →spatial pyramid pooling →fc layers →output
So it takes the advantage of parameters sharing at CNN layer, and can adapt to different input size (since input size is limited by the fc layers not conv layers, in R-CNN the crop has to be warped to fulfil certain size; SPP in SPP-Net elevates this size constraint)

SPP illustration (ref)

How to select Region of Interest (ROI) here: Based on the feature maps at Conv5, select the strongest activations and project the bounding box backwards to the original image. That works because objects in feature maps and original images have the same relative position

How to choose ROI (ref)

SPP-Net fine-tuning procedure

With the above SPP-Net flow as the basis
Load pre-trained model and calculate the SPP features in all ROI, F
Use F to fine tune only the fc layers, fc6 →fc7 →fc8 (different from R-CNN which fine tunes the conv layers as well)
Calculate the new fc7 features and use them for SVM classifier
Use F for bounding box regression

Remaining drawbacks inherited from R-CNN and new dragback

Need to storage large amount of features
Multi-phase training
Faster than R-CNN but still quite slow
New drawback: cannot fine tune conv layers before SPP layer

Reference

Machine Learning

De Jun Huang

Written by De Jun Huang

Editor for

dejunhuang

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams