Review: ParseNet — Looking Wider to See Better (Semantic Segmentation)
In this story, ParseNet is shortly reviewed. By using ParseNet, global context is added and the accuracy is improved. This is a 2016 ICLR paper with more than 200 citations when I was writing this story. (Sik-Ho Tsang @ Medium)
By using ParseNet, the cat in the above image is not going to be wrongly classified as bird, dog or sheep. It is mentioned that the global context can help to classify the local patches. Let’s see how it works.
What Are Covered
- ParseNet Module
1. ParseNet Module
Actually, ParseNet is simple as in the figure above.
At the lower path, at certain conv layer, Normalization using l2 norm is performed for each channel.
At the upper path, at certain conv layer, we perform global average pooling of those feature maps at that conv layer, and perform normalization using l2 norm. Unpooling is just replicating that values of that global averaged pooled vector to be the same size with the lower path so that they can be concatenated.
The reason of having the L2 norm is that, because the earlier layers usually have larger values than the later layers.
The above example showing that the features at different layers have different scales of values. After normalization, all features will have the same value range. And they are all concatenated together.
And a learnable scaling factor γ for each channel is also introduced after normalization:
3.2. PASCAL Context
We can also see that, without normalization, it does not work well for the ParseNet module.
3.3. PASCAL VOC 2012
- ParseNet Baseline: It is DeepLabv1 without CRF, 67.3%
- ParseNet: ParseNet Baseline with ParseNet module, 69.8%
- DeepLab-CRF-LargeFOV: DeepLabv1, 70.3%
Though ParseNet has lower performance than DeepLab-CRF-LargeFOV, it is still competitive, and it is end-to-end learning framework while CRF is a post-processing step which makes DeepLabv1 not.
We can see that failure cases above always happen when the object consists of more than one colors, or being occluded.
Though it has lower performance than DeepLabv1 (with CRF), it is used in DeepLabv3 and DeepLabv3+.