Uncover The Beauty Of Image Classification Model With Various Input Resolution

Implementing A Fully Convolutional Network (FCN) for Image Classification

Tan Li Tung
ViTrox-Publication
7 min readMar 5, 2021

--

Fully Convolutional Network (FCN) [Source]

Prerequisite

Before you go through the following content, a piece of basic knowledge on Convolutional Neural Network (CNN) would be helpful. The fundamental concept of CNN will not be covered in this tutorial.

Note: This tutorial aims to guide beginners in FCN to build a basic working FCN in TensorFlow.

Table of Content

  1. Introduction
  2. Fully Convolutional Networks (FCNs)
  3. IDE and Environment Setup
  4. Modelling — (a) Data Preparation, (b) Data Generation, (c) Model Architecture, (d) Model Training, (e) Model Prediction
  5. Conclusion
  6. Next Move
  7. References

Introduction

Tutorials that are related to CNN are all around the Internet. Most of these tutorials use the MNIST or CIFAR-10 dataset to train a model. Nevertheless, especially in a real-world situation, you will have a high probability to get a dataset with different image size. The most common solution that you could find on the internet is to resize the image to the input size of the CNN model. However, in many cases, resizing images will distort the important features leading to bad performance during model inference.

Fully Convolutional Network (FCN)

When I was dealing with the process of modelling with different input image size, the one that solved most of my problems is FCN. The application of FCN for image segmentation is common but we would like to apply it for image classification now.

FCN is similar to CNN but the fully connected layers or dense layers are replaced with 1X1 convolutional layers. This is because the dense layers need the user to fix the input size, which only allows the model to accept fixed-size images as inputs. On the other hand, 1X1 convolutional layers need the users to configure the filter kernel size parameter only. Thus, without the fully connected layers, FCN is capable to load input images with different sizes as inputs.

IDE and Environment Setup

This tutorial runs on Python. If you have already installed your code editor and environment, you may skip this part. To download Python, head to this link. (Note that we use Python 3.7.0 instead of Python 3.9). At the bottom of the page, select the installer based on your operating system.

Python 3.7.0 Download Page

Once downloaded, you may run the installation. On the first page of installation, check the add Python 3.7 to PATH and select “Install Now”. Now we are done with Python installation.

Python 3.7 Installation

We will run the code on PyCharm Community Edition. You may download the PyCharm editor here. Select your operating system and select the community version. Once downloaded, you may run the installation as usual. When arrived at this page, check all the checkboxes and click “Next” and click “Install”.

PyCharm Community Edition Installation

Now we are all set! We are ready to go to the code!

Modelling

Source Code

All the codes used in this tutorial can be found in this GitHub Repository. To download it, head to the GitHub Repository, click on “Code” and select “Download ZIP”. Once downloaded, unzip the files. You may use git clone if you are familiar with Git.

Download GitHub repository

Open you PyCharm editor and head to “File → Open project” and select the “Python-Fully-Convolutional-Network-Classification” project.

Part 1: Data Preparation

For this tutorial, we will be using the edited MNIST digit dataset, which can be obtained here or in this GitHub repository with the file name dataset.zip. Why we use this dataset?

  1. The dataset has 10 classes only, which is less complicated.
  2. The dataset is modified so the images contain 28X28 pixels (default) and 29X29 pixels.
  3. The dataset is small in size, so you could follow the tutorial even you don’t have a GPU.

Of course, you may also use another dataset or your own dataset with more classes. Some of the other datasets you can find them here.

In the main.py file, the dataset will be extracted if you run the code for the first time. The code snippet is shown below:

The data will be downloaded and split into train and test dataset for you in the folder named dataset . The dataset information is listed below:

Part 2: Data Generation

In the generator.py file, there is a class called the Generator . The data generator splits the dataset into batches and feeds them one by one during the model training. But wait…why bother using a generator?

In the training of CNN, we usually resize the images into the same size and store them in arrays or tensors. However, when dealing with images in different sizes, you would not be able to do this (It will throw an error). We could always feed the images one by one to the model, but the training would be definitely slow!

So what we are going to do here is to split the dataset into batches and scale the images into the same size within a batch(i.e. The image is padded with a black background to the same size). In other words, the image size within a batch is the same, but the image size between batches might be different! The concept can be illustrated as below:

Left: Images padded to the same size within a batch | Right: Image size might be different from batch to batch

In this tutorial, since our image is small in size, we would use a batch size of 128. We would expect our batches have the size of either 28X28 or 29X29 pixels.

Part 3: Model Architecture

We got the data prepared. Now let’s build a model! The model architecture is shown below:

This is a very simple model with only 4 convolutional layers. Note that there is no dense layer in the model. The dense layer is replaced by the 1X1 convolutional layer with the number of filters is set to the number of classes (In our case, 10 classes).

Part 4: Model Training

We are done with the data generator, model and let’s get started to train our model. The model that I have trained used Stochastic Gradient Descent (SGD) as the optimizer. It was trained for 5 epochs.

With just only 5 epochs, we can see that the model can get a validation accuracy of up to 98.34%! This indicates that the model can train on images of different sizes and also make predictions using the images with different sizes.

Part 5: Model Prediction

After the training, let’s generate some test cases so that we could verify that the model works fine. 25 test images are shown in the figure below:

Testing The FCN Model

The model looks good on the predictions! From the first 25 images, it is observed that the model can predict the correct digits!

Conclusion

In conclusion, The goal to build a FCN model that can take input images with varied sizes is achieved. This makes the model more practical to be used in real scenarios. Nevertheless, when the dataset is more complicated (i.e. contains more color channels or is larger in size), more layers might need to be added into the model to have better performance in term of prediction accuracy.

Next Move

You have now mastered the basics of FCN. The next step, try to train the model on a different dataset, which you could obtain here and have fun! If you enjoy this tutorial, please feel free to give a clap!

References

[1] Rawlani, H. (2020) Understanding and implementing a fully convolutional network (FCN). Towards Data Science.

[2] LeCun, Y. (n.d.) The MNIST Database of Handwritten Digits. MNIST.

[3] 智能算法. (2020) 全卷积神经网络(FCN)详解. 知乎

--

--