Mobilenet Based Single Short Multi-box Detector in Pytorch, ONNX and Caffe2
This is a brief note on how to change VGG net based SSD to Mobilenet based SSD. For the implemenatation, please check this repo. For the explanation and implementation of SSD, please see my previous post Understand Single Shot MultiBox Detector (SSD) and Implement It in Pytorch.
The basic architecture is the same, you just need to replace VGG with mobilenet and choose layers to branch out to generate feature maps for the prediction heads.
I tried to translate the tensorflow version of Mobilenet to Pytorch. But I ended up finding ONNX didn’t support Relu6 when I tried to converted the pytorch model to ONNX. Therefore, I used the pre-trained mobilenet from this project pytorch-mobilenet, which used relu rather than relu6. However, the accuracy of the trained SSD was not impacted as shown in the experiment.
We branch out two paths from layer 12 and 14 to generate features with sizes 19x19 and 10x10 respective. The rest feature maps are generated in a way similar to VGG based SSD. You need to change the input and output channel numbers though.
Choosing the best priors is an open question. As done in other projects, it uses feature maps with sizes 19x19, 10x10, 5x5, 3x3, 2x2 and 1x1, which is different from VGG based SSD. 38x38 feature map is not used due to poor results in experiment. The reason might be the features in that layer are not high level enough for distinguishing objects.
Cosine Annealing works well for training Mobilenet based SSD. I didn't use the restart part in experiment.
Converting to ONNX and Caffe2 Models
This process is pretty trivial. You can check the code here https://github.com/qfgaohao/pytorch-ssd/blob/master/convert_to_caffe2_models.py
For other information, please check the project page.
It supports out-of-box re-training on Google Open Images Dataset now.