image credits : Medium

Exploring Multi-Class Classification using Deep Learning

Srija Neogi
8 min readJun 28, 2020

--

The idea behind creating this guide is to simplify the journey of Machine Learning enthusiasts across the world. Through this guide, I will enable you to work on Deep Learning problems and gain from experience. I am providing basic understanding of some libraries of PyTorch along with python codes to run them. These should be sufficient to get your hands dirty.

Overview

What is classification?
Multi Class classification
Feed Forward Neural Network
Convolution Neural network

Classification is a subcategory of supervised learning where the goal is to predict the categorical class labels (discrete, unordered values, group membership) of new instances based on past observations.

There are perhaps four main types of classification tasks that you may encounter:

  • Binary Classification
  • Multi-Class Classification
  • Multi-Label Classification
  • Imbalanced Classification

Multi-class classification refers to those classification tasks that have more than two class labels. Consider an example, for any movie, Central Board of Film Certification, issue a certificate depending on the contents of the movie. A movie is rated as ‘U/A’ (meaning ‘Parental Guidance for children below the age of 12 years’) certificate. There are other types of certificates classes like ‘A’ (Restricted to adults) or ‘U’ (Unrestricted Public Exhibition), but it is sure that each movie can only be categorised with only one out of those three type of certificates.

In short, there are multiple categories but each instance is assigned only one, therefore such problems are known as multi-class classification problem.

Examples include:

  • Face classification.
  • Fruits and vegetables recognition.
  • Optical character recognition.
  • Hand written digit recognition.

Examples are classified as belonging to one among a range of known classes. In this blog we will explore the this problem with the help of two models. Let’s explore some Deep Learning models to understand how Image Classification is implemented.

I have used a built in image dataset by kaggle Fruits 360 dataset: A dataset of images containing fruits and vegetables Version: 2020.05.18.0 dataset can be accessed from https://www.kaggle.com/moltean/fruits/

Dataset properties:

  • Total number of images: 90483(one fruit or vegetable per image).
  • Training set size: 67692 images. I divided this dataset into two parts: training dataset (61692 images) and validation dataset(6000 images).
  • Test set size: 22688 images (one fruit or vegetable per image).
  • Multi-fruits set size: 103 images (more than one fruit (or fruit class) per image)
  • Number of classes: 131 (fruits and vegetables).
  • Image size: 100x100 pixels
  • Filename format: imageindex100.jpg (e.g. 32100.jpg)

The first layer of the neural network takes raw data as an input, processes it, extracts some information and passes it to the next layer as an output. Each layer then processes the information given by the previous one and repeats, until data reaches the final layer, which makes a prediction.
This prediction is compared with the known result and then, by a method called backpropagation the model is able to learn the weights that yield accurate outputs.

I’ll discuss two models for classifying images of fruits and vegetables to their respective class using PyTorch. I have enclosed my previous blogs that provides an introduction to PyTorch at the end of this.

  1. Feed Forward Neural Network it consist of a organised layers(input, hidden and output). Every unit in a layer is connected with all the units in the previous layer. These connections are not all equal: each connection may have a different strength or weight. The weights on these connections encode the knowledge of a network.
  2. Convolution Neural Network or CNN is an architecture designed to efficiently process, correlate and understand the large amount of data in high-resolution images. It is an advanced architecture of Feed forward neural network with some advancements like filters, padding and stride.

Step 1 : Import libraries

We begin by importing the required modules & libraries. We’ll require the following libraries

Pandas : A software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables.

Numpy : It provides fast and efficient operations on arrays of homogeneous data. NumPy extends python into a high-level language for manipulating numerical data, similar to MATLAB.

PyTorch provides the elegantly designed modules and classes torch. nn , torch.optim , Dataset and DataLoader to help create and train neural networks. From torchvision.datasets import ImageFolder from torchvision.transforms import ToTensor data. The ImageFolder() function expects our data to be organised in the following way: '../input/fruits/fruits-360/Training’We have the .jpg images and we convert them to tensors so use the ToTensor() to convert from jpg to torch tensors. torch.nn.Functional contains some useful functions like activation functions a convolution operations you can use. However, these are not full layers so if you want to specify a layer of any kind you should use torch.nn.Module.

Let me print the 131 classes of fruits and vegetables from Training data to get a overview of the dataset we are using to train and validate the models.

Now convert the images(in jpg) to tensors w.r.t their pixel intensities in 3 channels.

Let’s printing the length of the Training Dataset

Similarly convert the Test Dataset

We can view a couple of image from dataset using matplotlib, let's create a helper function to display an image and its label.

Step 2 : Put Training, Validation and Testing Datasets into a DataLoader

While building real world machine learning models, it is quite common to split the dataset into 3 parts:

  1. Training set — used to train the model i.e. compute the loss and adjust the weights of the model using gradient descent.
  2. Validation set — used to evaluate the model while training, adjust hyperparameters (learning rate etc.) and pick the best version of the model.
  3. Test set — used to compare different models, or different types of modelling approaches, and report the final accuracy of the model.

Since there’s no predefined validation set, we can set aside a small portion (6000 images) of the training set to be used as the validation set. We’ll use the random_split helper method from PyTorch to do this. To ensure that we always create the same validation set, we'll also set a seed for the random number generator.

We can now create data loaders for training and validation, to load the data in batches of size 128

We can look at batches of images from the dataset using the make_grid method from torchvision. Each time the following code is run, we get a different batch, since the sampler shuffles the indices before creating batches.

Step 3 : Defining Base class and utility functions

Let’s define a base class for image classification to calculate accuracy and loss.

Let’s define a utility function fit which trains the model for a given number of epochs and aevaluate function to calculate the loss and accuracy for epochs.

Here we are using CUDA. It enables developers to speed up compute-intensive applications by harnessing the power of GPUs for the parallelizable part of the computation. I hosted this project on Kaggle notebook that provides free GPU quota of 30 hours per week.

We can now initiate our training, validation and testing data loaders using DeviceDataLoader class for automatically transferring batches of data to the GPU and define a to_device() to move our model to the GPU .

Now we need two functions to plot the losses and accuracy w.r.t epochs

Step 5.1: Define the Feed Forward Neural Network Model

Now we are ready to create our first model. This is a Feed Forward Neural Network model. For this, we are using 3 hidden layers of 512,1024 and 256 nodes respectively. A hidden layer is located between the input and output layer of the model, in which the function applies weights to the inputs and directs them through an activation function as the output. In short, the hidden layers perform nonlinear transformations of the inputs entered into the network.

Here we need to understand two important class of torch.nn Library nn.Linear specifies the interaction between two layers. We give it 2 numbers, specifying the number of nodes in the two layer. Applies a linear transformation to the incoming data: y = x A T + b y

nn.ReLU is an activation function for hidden layers. Activation functions helps the model learn complex relationships between the input and the output. We use ReLU on all layers except for the output.

In the __init__ function, you are supposed to initialise the layers you want to use. Unlike keras, Pytorch goes more low level and you have to specify the sizes of your network so that everything matches.

In the forward(), you specify the connections of your layers. This means that you will use the layers you already initialised, in order to re-use the same layer for each forward pass of data you make.

Before we begin training, let’s instantiate the model once again.

Step 5.2: Train first Model

We’ll train the model for 5 epochs with a learning rate of .05

So we achieved pretty good less loss with the training dataset Let’s plot the loss vs No of Epoch.

plotting Accuracy vs No of Epoch

Let’s find accuracy of the model in the test dataset.

Step 5.3 : Predict using First Model

As you can see we achieved around 86% accuracy. So we can use this to predict a couple of images. For this purpose a predict_image function has been defined.

Step 6.1 : Define Second model with Convolution Neural Network

Now it’s time to improve the model using the CNN using the nn.Conv2d class from PyTorch.The 2D convolution is a fairly simple operation at heart: you start with a kernel, which is simply a small matrix of weights. This kernel “slides” over the 2D input data, performing an element wise multiplication with the part of the input it is currently on, and then summing up the results into a single output pixel. — Source

There are certain advantages offered by convolutional layers when working with image data.

CNN take advantage of local spatial coherence of images. This means that they are able to reduce dramatically the number of operation needed to process an image by using convolution on patches of adjacent pixels.

There are also the pooling layers, which downscale the image. This is possible because we retain throughout the network, features that are organised spatially like an image, and thus downscaling them makes sense as reducing the size of the image.

The kernel is a filter that is used to extract the features from the images. It’s a matrix that moves over the input data, performs the dot product with the sub-region of input data, and gets the output as the matrix of dot products. Padding is a term that refers to the amount of pixels added to an image when it is being processed by the kernel of a CNN.

nn.Sequential can help us group multiple modules together. We defined our nn.conv2d() we are using 4 parameters viz. input no. of channels ,output no. of channels, kernel size, padding.

Step 6.2 : Train the Second Model

We’ll use the same fit and evaluate functions like for ffnn and initialise the model and start training.

Let’s change the hyperparameters.

As you see we achieved high accuracy in a short no of epoch. This is usually a situation of overfitting. We will plot the Loss vs No of Epochs.

So the model can now predict with a accuracy of around 89% percent. Again let’s predict a couple of images with CNN model.

  • Data normalization
  • Data augmentation
  • Residual connections
  • Batch normalization
  • Learning rate scheduling
  • Weight Decay
  • Gradient clipping

This will bring a better modelling approach and lessen the risk of overfitting as what happened in our case of CNN approach. You can try applying each technique independently and see how much each one affects the performance and training time. As you try different experiments, you will start to cultivate the intuition for picking the right architectures, data augmentation & regularization techniques.

Referances

--

--