Image classification — CNN with PyTorch

5 min readNov 6, 2018

I recently finished work on a CNN image classification using PyTorch library. As per wikipedia, “PyTorch is an open source machine learning library for Python, based on Torch, used for applications such as natural language processing. It is primarily developed by Facebook’s artificial-intelligence research group, and Uber’s “Pyro” software for probabilistic programming is built on it.”

I have noticed that PyTorch is faster than Keras with Tensorflow backend. Loading the tensors on GPU is also easier and more intuitive in PyTorch. Also, I felt that the coding is much more simpler and easier to deploy.

I have used Cats and Dogs images example for the classification with 8000 images in the training set and 2000 images in the test set. You will need good GPU configurations to successfully train this model. I am currently using an old gaming laptop with NVidia GTX 980M for GPU which is not very good. I normally train the model with a smaller dataset to check for errors and then train the actual dataset overnight. (Planning on getting a GTX 1080Ti desktop built for 2019)

For loading the image, I have used images in differently named folders as the dataset is in that format.

Import the libraries (you will need OpenCV-Python, PyTorch, TorchVision and PIL(Python Imaging Library) apart from Anaconda packages)
Define the transformations
Load dataset with transformations
Make dataset iterable with batch size; shuffle the dataset to get a good mix of the different category labels
Check the images to if the load is correct

Now check that all the data is properly loaded for training the model.

Now we get to defining the CNN class in PyTorch. A good grasp of CNN intuition is necessary to understand the mechanics of how to define this. Concepts of CNN such as kernels, padding, batch normalization, maxpooling and flattening are a prerequisite for this. Check out this link for a great read on these concepts. Also read about dropout here.

Architecture of a CNN. — Source: https://www.mathworks.com/videos/introduction-to-deep-learning-what-are-convolutional-neural-networks--1489512765771.html

Class is defined and all the factors have been explained in comments.

Please see the gist at — https://gist.github.com/viveksasikumar/a4c8a70c4ac2bd466dc6a54dcb09dbe6

We should obviously reduce the size of image further by using more convolution layers and maxpooling before flattening it. You should experiment with smaller batches with different number of CNN and DNN layers and figure out what works for your images.

Now we move on to the next stage of defining a class object and initializing it into the GPU. I am using a regular Stochastic Gradient Descent optimizer (SGD) for the NN. Definitely experiment with Adam, Adagrad, Adadelta, ASGD, Adamax, Rprop etc.

Loss function will be Binary Cross Entropy since it is Cats Vs Dogs. If there were multiple categories, we would use Categorical Cross Entropy. Learning rate is kept at 0.01 here. The lower it is, the slower the training will be.

Gradient descent with small (top) and large (bottom) learning rates. Source: Andrew Ng’s Machine Learning course on Coursera

Train the model and test it continuously to calculate the loss and accuracy for both dataset across each epoch.

If you have a great GPU with good memory, this won’t be too painful. Otherwise, set it up for training and do other stuff. I have found that working out, binge watching Netflix or taking a nap helps.

After the model has run successfully, plot the loss and accuracy for training and test set to ensure there is no overfitting.

Running more than 10 epochs to train will increase the accuracy further.

You can save the trained model with .pth or .pkl (PyTorch serialization or Pickle serialization). This trained model can be used in a python file for deployment with Flask framework on a Docker container in AWS or VM or Local network.

You can test with images in you jupyter notebook by importing the images in the following method.

Feature extraction and classification is the core problem that we are solving using CNN. Here the issue is that we are creating our own model with no amount of pretraining. We can use pretrained models that are available such as resnet, alexnet, squeezenet, vgg, densenet etc for feature extraction and better prediction accuracy.

import torchvision.models as models
resnet18 = models.resnet18(pretrained=True)
alexnet = models.alexnet(pretrained=True)
squeezenet = models.squeezenet1_0(pretrained=True)
vgg16 = models.vgg16(pretrained=True)
densenet = models.densenet161(pretrained=True)
inception = models.inception_v3(pretrained=True)

Here I am using Densenet 201 with SGD optimizer and decay rate of 0.1 for every 7 epochs to train the model with our images.

Based on this, our predictions are much more robust due to the power that comes from using pre-trained models that already has an eye on extracting relevant features. Using Adam optimizer and increasing epochs might improve the model although one has to be wary of over-fitting.

Let me know what you think! I have used a lot of PyTorch tutorials, GitHub repos, MOOCs and blogs to put together this article. Please feel free to comment and advise me on better ways to run these models.

Image classification — CNN with PyTorch

Written by Vivek Sasikumar