Introduction to CNN & Image Classification Using CNN in PyTorch

Design your first CNN architecture using the Fashion MNIST dataset.

Amey Band
The Startup
7 min readSep 23, 2020

--

Source: Photo by Flixstock.com

Introduction

Image Classification is the technique to extract the features from the images to categorize them in the defined classes. It is a common-sense problem for the human to identify the images but, for the machine, it’s definitely not. At first, we have to train the machine with the images, and later, when we introduce the model with a new image then based on the training it will give us significant results.

In this article, we will understand how convolutional neural networks are helpful and how they can help us to improve our model’s performance. We will also go through the implementation of CNNs in PyTorch.

Table of Contents

  1. PyTorch Overview
  2. What is CNN?
  3. Implementation of CNNs in PyTorch.
  4. Conclusion

1. PyTorch Overview

Source: Photo via Venturebeat.com

PyTorch is an open-source machine learning library based on the Torch library. It is majorly used for applications such as computer vision and natural language processing. PyTorch is primarily developed and maintained by Facebook’s AI Research lab. It is free and open-source software released under the Modified BSD license. PyTorch is a Python package that provides two high-level features:

1. Tensor computation (like NumPy) with strong GPU acceleration

2. Deep neural networks built on a tape-based autograd system

Read here for more information about PyTorch.

2. What is CNN?

Source: Photo via FloydHub.com

Convolutional Neural Network is the type of Neural Network that is most often applied to image processing problems. The major application of CNN is the object identification in an image but we can use it for natural language processing too. Once you will go through the complete article, you will get to know why CNN is most effective in these fast-growing areas.

CNN works differently as they treat the data in the spatial aspect. In the neural network, you know that when we provide input to the input layer, the number of neurons in this layer is equal to the number of pixels in the case of images. In CNN, instead of neurons being connected to every neuron in the previous layer, they are only connected to the neurons close to it.

There are some important layers in CNN:

1. Convolution Layer

2. Pooling Layer

3. Fully-connected Layer

Here we will understand the concepts behind the two main layers,

(i) Convolution Layer

Simple Convolution of a (5x5) matrix with a (3x3) kernel

Convolution refers to the filtering process that happens in this type of neural network. Mathematically, convolution is described as the function derived from two given functions by integration which expresses how the shape of one function is modified by the other.

The convolution layer’s output shape is affected by:

  • Kernel Size: the size of the filter.
  • Input Dimensions: input image size
  • Padding: we can add layers of 0s to the outside of the image in order to make sure that the kernel properly passes over the edges of the image.
  • Strides: the rate at which the kernel passes over the input image. A stride of 2 moves the kernel in 2-pixel increments.

Read more about the convolution parameters here.

Consider the above image, the size of the image is (5x5) and the filter’s size is (3x3). After the input image processes through the convolution layer, the output image we obtain has the dimension of (3x3).

Where n_in denotes the dimension of the input image, f denotes the window size, and s denotes the stride.

The function for convolution in pyTorch is torch.nn.Connv2d().

(ii) Pooling Layer

Source: Photo via bouvet.no

The pooling layer in CNN progressively reduces the spatial size of the representation to lower the number of parameters in the convolutional neural network.

From the above image, you conclude that there are three types of pooling methods:

  • Max Pooling: Extract the maximum value of the patch from the feature map.
  • Average Pooling: Extract the average of all patches in the feature map.
  • Sum Pooling: Extract the sum of all patches in the feature map.

Max-pooling enables the network to concentrate on a few neurons rather than all of them which has a regularizing effect on the network, and likely to overfit the training data.

The function for max-pooling in pyTorch is torch.nn.MaxPool2d().

For detailed information, go through the articles mentioned in the reference section below.

(iii) Fully-connected Layer

Photo Source

Fully connected layers are an essential component of Convolutional Neural Networks (CNNs), which have been proven very successful in recognizing and classifying images for computer vision. The CNN process begins with convolution and pooling, breaking down the image into features, and analyzing them independently.

I would like you will go through the article to get more understanding about fully-connected layers.

3. Implementation of CNNs in PyTorch.

We are working on the Fashion MNIST dataset, our task is to identify the type of apparel by looking at a variety of images in the dataset.

#download the dataset from keras.datasets
from keras.datasets import fashion_mnist
((trainX, trainY), (testX, testY)) = fashion_mnist.load_data()
print(trainX.shape)
print(trainY.shape)
print(testX.shape)
print(testY.shape)

There are 60,000 images in the train set and 10,000 images in the test set.

#Importing the libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
#for creating validation set
from sklearn.model_selection import train_test_split
#for evaluating the model
from sklearn.metrics import accuracy_score
#PyTorch libraries and modules
import torch
from torch.autograd import Variable
from torch.nn import Linear, ReLU, CrossEntropyLoss, Sequential, Conv2d, MaxPool2d, Module, Softmax, BatchNorm2d, Dropout
from torch.optim import Adam, SGD

We just imported the necessary libraries required for data visualization and predictive model. Let’s visualize some of the images in the training dataset,

plt.figure(figsize=(10,10))
plt.subplot(221), plt.imshow(trainX[10])
plt.subplot(222), plt.imshow(trainX[20])

For the pre-processing, we have to convert the images into the torch format.

train_x, val_x, train_y, val_y = train_test_split(trainX, trainY, test_size = 0.1)#converting training images into torch format
train_x = train_x.reshape(54000, 1, 28, 28)
train_x = torch.from_numpy(train_x)
#converting the target into torch format
train_y = train_y.astype(int)
train_y = torch.from_numpy(train_y)
#converting validation images into torch format
val_x = val_x.reshape(6000, 1, 28, 28)
val_x = torch.from_numpy(val_x)
#converting the target into torch format
val_y = val_y.astype(int)
val_y = torch.from_numpy(val_y)

We are creating a simple CNN architecture with just 2 convolutional layers with kernel size = 2, stride=1 & padding=1 to find the filters from the images.

## Architectureclass Net(Module):
def __init__(self):
super(Net, self).__init__()
self.cnn_layers = Sequential(

#Defining a 2D convolution layer
Conv2d(1, 4, kernel_size=2, stride=1, padding=1),
BatchNorm2d(4),
ReLU(inplace=True),
MaxPool2d(kernel_size=2, stride=2),
#Defining another 2D convolution layer
Conv2d(4, 4, kernel_size=2, stride=1, padding=1),
BatchNorm2d(4),
ReLU(inplace=True),
MaxPool2d(kernel_size=2, stride=2)
)
self.linear_layers = Sequential(
Linear(4 * 7 * 7, 10)
)
#Defining the forward passdef forward(self, x):
x = self.cnn_layers(x)
x = x.view(x.size(0), -1)
x = self.linear_layers(x)
return x

Let’s go for further processing,

#define the model
model = Net()
#define the optimizer
optimizer = Adam(model.parameters(), lr=0.07)
#defining the loss function
criterion = CrossEntropyLoss()
#checking if GPU is available
if torch.cuda.is_available():
model = model.cuda()
criterion = criterion.cuda()
print(model)

I hope, you understand the architecture of the CNN we defined above. Now we train the model with 25 epochs and will look for the training losses.

At the start of epoch values, you can see that loss is high but as we process down and down the loss is also decreasing. Let’s visualize the training loss and validation loss.

#plotting the training and validation lossplt.plot(train_losses, label='Training loss')
plt.plot(val_losses, label='Validation loss')
plt.legend()
plt.show()

Now we check the accuracy for the training set,

# prediction for training set
with torch.no_grad():
train_x = train_x.float()
output = model(train_x)

softmax = torch.exp(output).cpu()
prob = list(softmax.numpy())
predictions = np.argmax(prob, axis=1)

# accuracy on training set
accuracy_score(train_y, predictions)

Okk! let’s check the accuracy for the validation set,

# prediction for validation set
with torch.no_grad():
val_x = val_x.float()
output = model(val_x)

softmax = torch.exp(output).cpu()
prob = list(softmax.numpy())
predictions = np.argmax(prob, axis=1)

# accuracy on validation set
accuracy_score(val_y, predictions)

We got the validation accuracy and training accuracy near about the same using this kind of convolutional neural network architecture.

Conclusion

From this article, I hope you understand the concept of the convolution layer and the pooling layer along with how CNN is useful for image classification and object detection tasks and how to implement CNN architecture using PyTorch.

References

The code is available on Github.

--

--