Step-by-Step Shapes Image Classification using Convolutional Neural Network (CNN) and PyTorch

Muhammad Asif
6 min readJul 29, 2019

In this tutorial, I will explain step-by-step process of classifying shapes image using one of the promising deep learning technique Convolutional Neural Network (CNN). The tutorial comprises of following major steps:

  1. Shapes image dataset
  2. Preprocess dataset
  3. A brief introduction of CNN
  4. Implementation of CNN in PyTorch

Shapes image dataset

I chose Four Shapes dataset from Kaggle. This dataset has 16,000 images of four types of shapes, i.e., circle, square, triangle and start. Each image has resolution 200x200 pixels. First of all download this dataset, probably you will need to login to Kaggle. Once downloaded, extract the zip file. Extracted directory will has four subdirectories containing respective type of shape images

Shapes’ images in this dataset have been rotated on different angles so that any machine learning technique can learn the maximum possible variations of a shape.

Preprocess dataset

Before applying any machine learning technique to dataset, preprocessing the data is essential to get optimise results. Example of some preprocessing steps are: image enhancement, restoration, resizing, etc. Luckily this four shapes dataset is already preprocessed as all the images are resized to the same size. However one more step is needed here. CNN technique requires that dataset images should be splited in two categories, i.e., training, validation. To meet this requirement, dataset images directories should be arranged in following pattern

Python code below will do the required thing

os.mkdir(os.path.join(path_target, 'train'))
os.mkdir(os.path.join(path_target, 'valid'))
for t in ['train', 'valid']:
for folder in ['circle/', 'square/', 'star/', 'triangle/']:
os.mkdir(os.path.join(path_target, t, folder))

As per standard practice, I chose to split the images into ratio of 70:30. It means 70% of total images will be used for training CNN model and 30% of images will be used for validation. I wrote a small routine in python to do this task

def preprocessData(dirName, ext):
allFiles = list()
for root, dirs, files in os.walk(dirName):
for file in files:
if file.endswith(ext):
allFiles.append(os.path.join(root, file))
shuffle(allFiles)split = 0.7
split_index = floor(len(allFiles) * split)
training = allFiles[:split_index]
testing = allFiles[split_index:]
return training, testing

Above python code puts all the files with specific extension on pathdirNamein a list, shuffles them and splits them into ratio of 70:30.

A brief introduction of CNN

In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery

In Artificial Neural Network (ANN), CNNs are widely used for image classification, object detection, face recognition, etc. CNNs showed promising results in achieving above mentioned tasks. In simple words, for image classification CNNs take image as an input, process it and classify it as a specific category like person, animal, car, etc. Mainly CNNs have three types of layers, i.e., convolutional layers, pooling layers and fully connected layers.

We take example of our selected four shapes dataset here. First of all we define our CNN model that consists of several layers of neurones depending upon the complexity of images. As our dataset has only four categories of shapes and images are smaller in size, we need simpler form of CNN model. Next, for a CNN model to successfully classify images into their respective category, it requires a training. In training phase, we flood our model with bunch of images, the CNN model extracts unique features from images and learns them. Once the model achieved prominent accuracy, training is stopped and that model is saved for later use in testing images.

For detail understanding of CNNs it is recommended to read following article

Implementation of CNN in PyTorch

There are many frameworks available to implement CNN techniques. Tensorflow and PyTorch are widely used considered most popular. Tensorflow is powered by Google whereas PyTorch is governed by Facebook. In this tutorial, I chose to implement my CNN model to classify four shapes images in PyTorch.

You need to setup Python environment on your machine. It is recommended to follow this article to install and configure Python and PyTorch.

Alternatively you can Google yourself to prepare your machine for CNN implementation in PyTorch.

Import necessary libraries in python

import numpy as np
import matplotlib.pyplot as plt
from torchvision import transforms
import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
from torch import optim
from torchvision.datasets import ImageFolder
from torch.utils.data import Dataset,DataLoader

It is recommended to have GPU in your machine, it will drastically shortened the CNN training time. GPU and CUDA support can be checked as

is_cuda = False
if
torch.cuda.is_available():
is_cuda = True

Do image normalisation. I resized images to 64x64 to speedup the training process as my machine lacks GPU

simple_transform = transforms.Compose([transforms.Resize((64, 64))
,transforms.ToTensor()
,transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

Images split in training and validation sets are loaded using PyTorch’s DataLoader

train = ImageFolder('images/train/',simple_transform)
valid = ImageFolder('images/valid/',simple_transform)

train_data_loader = torch.utils.data.DataLoader(train,batch_size=32,num_workers=3,shuffle=True)
valid_data_loader = torch.utils.data.DataLoader(valid,batch_size=32,num_workers=3,shuffle=True)

Next I define my CNN model as

class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(3380, 50)
self.fc2 = nn.Linear(50, 4)


def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(x.size(0), -1)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = F.relu(self.fc2(x))
return F.log_softmax(x,dim=1)

It consists of two convolutional layers, two pooling layers and two fully connected layers. As images in four shapes dataset are relatively smaller so I kept my CNN model simpler.

Following code will start training and will give oppurtunity to our CNN model to learn features of images

model = Net()
optimizer = optim.SGD(model.parameters(),lr=0.01,momentum=0.5)
train_losses , train_accuracy = [],[]
val_losses , val_accuracy = [],[]
for epoch in range(1,20):
epoch_loss, epoch_accuracy = fit(epoch,model,train_data_loader,phase='training')
val_epoch_loss , val_epoch_accuracy = fit(epoch,model,valid_data_loader,phase='validation')
train_losses.append(epoch_loss)
train_accuracy.append(epoch_accuracy)
val_losses.append(val_epoch_loss)
val_accuracy.append(val_epoch_accuracy)

Training log will look like

Epoch: 1 - training loss is 0.38 and training accuracy is 84.00
Epoch: 1 - validation loss is 0.02 and validation accuracy is 99.00
Epoch: 2 - training loss is 0.05 and training accuracy is 98.00
Epoch: 2 - validation loss is 0.00 and validation accuracy is 99.00
Epoch: 3 - training loss is 0.03 and training accuracy is 98.00
Epoch: 3 - validation loss is 0.00 and validation accuracy is 99.00
Epoch: 4 - training loss is 0.02 and training accuracy is 99.00
Epoch: 4 - validation loss is 0.00 and validation accuracy is 100.00
Epoch: 5 - training loss is 0.02 and training accuracy is 99.00
Epoch: 5 - validation loss is 0.00 and validation accuracy is 100.00
Epoch: 6 - training loss is 0.01 and training accuracy is 99.00
Epoch: 6 - validation loss is 0.00 and validation accuracy is 99.00
Epoch: 7 - training loss is 0.01 and training accuracy is 99.00
Epoch: 7 - validation loss is 0.00 and validation accuracy is 100.00
Epoch: 8 - training loss is 0.01 and training accuracy is 99.00
Epoch: 8 - validation loss is 0.00 and validation accuracy is 99.00
Epoch: 9 - training loss is 0.01 and training accuracy is 99.00
Epoch: 9 - validation loss is 0.00 and validation accuracy is 100.00
x-axis represent number of epochs and y-axis represents loss

Complete source code of this tutorial can be found on Github repository.

Queries are welcomed, you can also leave comments here.

--

--