Semantic segmentation — SegNet

Abhishek Kumar
2 min readJun 8, 2020

--

SegNet

SegNet is a image segmentation architecture that uses an encoder-decoder type of architecture. This is a “Fully Convolutional Network

Key Insights from the paper

Architecture

This implementation uses a pre-trained VGG16 model for its encoder part. All the 13 convolution layers are of VGG16 are used. For every encoder layer, there exists a corresponding decoder network to upsample the image to its original size. A pixelwise classification layer i.e Softmax unit is followed by decoder network.

What makes SegNet special?

In the decoding process, to upsample the layers, SegNet uses the “max pooling indices” at the corresponding encoder layer are recalled. This makes the training process easier since the network need not learn the upsampling weights again.

The indices of the pixels are stored in a sparse matrix.

Lesser number of parameters

Since all the fully connected layers are removed the number of trainable parameters reduce from 134M to 17.4M . So it is computationally feasible.

Differences between SegNet and UNet

In Segnet only the pooling indices are transferred to the expansion path from the compression path, using less memory. Where as in UNet, entire feature maps are transferred from compression path to expansion path making, using a lot of memory.

Results

Qualitative results on CamVid dataset
On Camvid dataset, this architecture obtained best results at the time of its release.

References

arXiv:1511.00561v3

--

--