Learning Day 68: Semantic segmentation 2 — DeepLab, atrous/dilated convolution
Published in
3 min readJun 23, 2021
Background
- For FCN in Day 67, it still suffers from the problem of big-step upsampling from small feature maps to the final output.
- DeepLab aims to solve this problem to make the object boundary more accurate
DeepLab v1
- CNN + CRF
- Use Atrous/Dilated convolution at the deeper layers in CNN
Atrous/Dilated convolution
- As compared to using a 3x3 filter, holes are inserted in-between the filter to make it cover an area of 5x5
- With the similar amount of weights, the field of view is bigger
- Use it with the appropriate stride to replace upsampling deconvolution layer. The resultant feature map has more details by the below comparison.
- A concept called dilation rate. The amount of holes inserted in-between=rate-1. Eg. Rate=2, no. of holes to be inserted=1. However, it cannot be too big. If rate ≥the input size, it is similar to doing convolution with 1x1 filter.
CRF (Conditional Random Field)
- Take the rough segmentation results from CNN and refine the boundary using fully connected CRF
- I don’t quite understand the theory of CRF
DeepLab v2
- The base model can be VGG16 or ResNet101
- Introduce Atrous Spatial Pyramid Pooling (ASPP)
Atrous Spatial Pyramid Pooling (ASPP)
- Use different dilation rate to capture features at different scales
DeepLab v3
- More universal as it can use any CNN structure as a backbone
- Use batch-norm in ASPP
- No CRF
- Use “series” and “parallel” connections of atrous convolution layers
DeepLab v3+
- Expanded on v3
- added a encoder-decoder structure to conserve boundary information
- The original DeepLab v3 is used as the encoder to apply atrous convolution at multiple scales
- Decoder gets the low-level features from the backbone model and concat with output from encoder after some conv layers and upsampling