Semantic segmentation — SegNet
SegNet
SegNet is a image segmentation architecture that uses an encoder-decoder type of architecture. This is a “Fully Convolutional Network”
Key Insights from the paper
Architecture
This implementation uses a pre-trained VGG16 model for its encoder part. All the 13 convolution layers are of VGG16 are used. For every encoder layer, there exists a corresponding decoder network to upsample the image to its original size. A pixelwise classification layer i.e Softmax unit is followed by decoder network.
What makes SegNet special?
In the decoding process, to upsample the layers, SegNet uses the “max pooling indices” at the corresponding encoder layer are recalled. This makes the training process easier since the network need not learn the upsampling weights again.
Lesser number of parameters
Since all the fully connected layers are removed the number of trainable parameters reduce from 134M to 17.4M . So it is computationally feasible.
Differences between SegNet and UNet
In Segnet only the pooling indices are transferred to the expansion path from the compression path, using less memory. Where as in UNet, entire feature maps are transferred from compression path to expansion path making, using a lot of memory.
Results
References
arXiv:1511.00561v3