Semantic Segmentation using Fully Convolutional Neural Networks

Wilbur de Souza
Nov 1, 2017 · 2 min read

This post involves the use of a fully convolutional neural network (FCN) to classify the pixels in an image. The deep learning model uses a pre-trained VGG-16 model as a foundation (see the original paper by Jonathan Long). In this implementation of FCN, we reuse pre-trained layers 3, 4 and 7 of the VGG model and then generate 1x1 convolutions of these layers. This phase is like an encoder that extracts features. The encoding is then followed by a decoding process using a series of upsampling layers. Upsampling is performed using transposed convolution or more accurately called fractionally strided convolution. This is an operation that goes in the opposite direction to a convolution and allows us to translate the activations into something meaningful related to the image size by scaling up the activation size to the same image size. The encoding and decoding process is ilustrated below

Image for post
Image for post

Image Credit: http://cvlab.postech.ac.kr/research/deconvnet/

In the process, we lose some resolution because the activations were downscaled and therefore to add back some resolution by adding activations from the previous layer called as skip connections.

Image for post
Image for post

Training and optimisation

The network was trained on a g3x2XLarge AWS instance on the Kitti Road data set using the following hyper parameters: ‘lr’: 0.0001, ‘keep_prob’: 0.25, ‘epochs’: 25, ‘batch_size’: 16, ‘std_init’: 0.01, ‘num_classes’: 2, ‘image_shape’: (160, 576)

‘keep_prob’: 0.25, ‘epochs’: 25, ‘batch_size’: 16, ‘std_init’: 0.01, ‘num_classes’: 2, ‘image_shape’: (160, 576)

Optimisation was done using cross entropy loss minimisation as the metric and ADAM optimiser. The cross entropy loss is computed against the correct ground truth image (also flattened)

Retrospective

The results were good for the most part as can be seen in the following output samples.

Image for post
Image for post

It even seems to distinguish road and cossing rail tracks( see the third row third column). It does have some failings particularly small patches around cars. The model could benefit from data augmentation but has been deferred for a future run along trying the model on the Citiscapes data set. The code for this project can be found at https://github.com/asterixds/SemanticSegmentation

Update: Results on Citiscapes dataset using Mask R-CNN

Notes to follow…

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store