[WEEK 4 - Facade Parsing using Deep Learning]

Published in

bbm406f18

3 min readDec 24, 2018

Theme: Segmenting an image of a facade into predefined semantic categories

Team Members: Onur Cankur, furkan karababa, Javid Rajabov

This is our fourth blog post about our project and we are working hard to make progress on it. You can see what we have done until now in this blog post.

An example of facade parsing(from DeepFacade paper)

This week, we started to write progress report which we have to submit tomorrow(24.12.2018). We worked hard to be able to have a good progress report and it is almost done! First, we tried to implement source code of our basis which is explained in the paper that named as “Fully Convolutional Networks for Semantic Segmentation” written by Jonathan Long, Evan Shelhamer, Trevor Darrell. Unfortunately, we could not get any result due to difficulty of implementing the algorithm but we are very close to it. However, we have lots of information about how it works and we are sure about that we will definitely run the source code with our data as soon as possible and we will improve it. In this blog post, our basis model will be explained briefly, and resources the we used to study about it will be added as references.

Fully Convolutional Networks for Semantic Segmentation

We will not give very detailed explanation of paper in this blog post because it should be about what we have done in this week. You can find detailed explanation of it in our progress report.

Fully Convolutional Networks made a very big breakthrough in the field of Convolutional Neural Network for the task of semantic segmentation and it has been improved continuously over the years. It got much more better results than state-of-art in 2015.

The reason that they say it “fully convolutional” is because all of its layers are convolutional layers. As you can see from the above picture, it downsamples the convolutions like known architectures AlexNet, VGG, GoogleNet… The most important and interesting part of the implementation is in the upsampling part. Upsampling is used to recover spatial information. Furthermore, to fully recover the fine-grained spatial information lost in the pooling or downsampling layers, skip connections are used.

There are three main variants of FCN. The figure below shows the FCN-32, FCN-16 and FCN-8.

As shown above, these 3 different architectures differ in the stride of the last convolution, and the skip connections used to obtain the output segmentation maps. We assume that some improvements can be done in upsampling, pooling methods and in these skip connections. Again, you can find explanation of our assumptions to improve FCN in our progress report. Lots of improvements have done since 2015 by many researchers but I will not explain them not to go beyond this blog post’s purpose.

You can find the resources and references that we used to write this blog post.

References

FCN Paper

Fully Convolutional Networks (FCN) for 2D segmentation - DeepLearning 0.1 documentation

There are variants of the FCN architecture, which mainly differ in the spatial precision of their output. For example…

deeplearning.net

Literature Review: Fully Convolutional Networks

Here’s what I pulled out of “Fully Convolutional Networks for Semantic Segmentation”, by Long, Shelhamer, and Darrell.

medium.com

shelhamer/fcn.berkeleyvision.org

Fully Convolutional Networks for Semantic Segmentation by Jonathan Long, Evan Shelhamer, and Trevor Darrell. CVPR…

github.com