Deep Learning: Creating an Image Classifier using PyTorch with CIFAR-10

Published in

Analytics Vidhya

7 min readMar 24, 2021

https://www.futura-sciences.com/tech/definitions/intelligence-artificielle-deep-learning-17262/

Please note that this article is in two parts. The first part, the one you’re reading right now, is about creating the image classifier. The part two is about operationalizing and deploying the model. You can find it here. A bit more info is provided at the end of this article. Thanks!

At first, these AI, Machine learning, Deep learning stuff sounded like some machine code kind of stuff, terrifying. If that’s your situation now, then I was once in your shoes. After doing some work, I got to see that it isn’t that terrifying. Hopefully, by the end of your read, you will agree with me.

In this article, we will be looking at building an image classifier. Yeah! You got me right, image classifier. Don’t be afraid, we will go through all the steps and sure it will be fun.

We will be PyTorch nn module to train and test our model on the CIFAR-10 data. CIFAR stands for Canadian Institute For Advanced Research. And the 10 stands for the 10 classes of images included in the dataset, namely: plane, car, bird, cat, deer, dog, frog, horse, ship, truck. So basically our model will be able to work with these items. You can download the dataset from kaggle.

There are other datasets out there that you can take a look at. Feel free to check them out here.
There are also some pretrained models out there. So far, the best performing model trained and tested on the CIFAR-10 dataset is GPipe with a 99.0% Accuracy. The aim of this article is not to beat that accuracy, We just want to get our hands dirty with building our own in-house nn.

There are mainly three steps involved when it comes to building nn;

Preparing and exploring our data
Build and training and
Testing.

Prepare and Explore

It is important that we see what kind of data we are working with right. Then we have to divide the data into two parts, the one to be used for training the one for testing. It is usually an 80% to 20% ratio but it depends on you.

Below are the necessary imports in order for us to load and divide our data.

import torchvisionimport torchvision.transforms as transforms

Then we create the sets and loaders

data_dir = './data' # directory of the cifar-10 data you downloadedtransform = transforms.Compose([transforms.RandomHorizontalFlip(), transforms.ToTensor()])
trainset = torchvision.datasets.CIFAR10(root=data_dir, train=True, download=True, transform=transform)train_loader = torch.utils.data.DataLoader(trainset, batch_size=5, shuffle=True, num_workers=2)testset = torchvision.datasets.CIFAR10(root=data_dir, train=False, download=True, transform=transform)test_loader = torch.utils.data.DataLoader(testset, batch_size=5, shuffle=False, num_workers=2)# The 10 classes in the dataset
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Now let’s explore the data for the training and test sets…

# to get the length of the taindata
print(len(trainset))# get sample of train data and see length
sample = next(iter(trainset))
print(len(sample))# get the image and it's label
image, label = sample
print(type(image))
print(type(label))# view image shape
image.shape# length of test data
print(len(testset))

Now that we have seen some basic stats, let us view some of the images from the training and test loaders

import matplotlib.pyplot as plt
import numpy as np# train_loader images
dataiter = iter(train_loader)
batch = next(dataiter)
labels = batch[1][0:5]
images = batch[0][0:5]for i in range(5):
    print(classes[labels[i]])
    image = images[i].numpy()
    plt.imshow(np.rot90(image.T, k=3))
    plt.show()# test_loader images
dataiter = iter(test_loader)
batch = next(dataiter)
labels = batch[1][0:5]
images = batch[0][0:5]for i in range(5):
    print(classes[labels[i]])
    image = images[i].numpy()
    plt.imshow(np.rot90(image.T, k=3))
    plt.show()

At this point, we know our data, at least to an extend. You can explore more if you like.

Build and Train

Now that we have explored our data, let us build our model. For a refresh, nn are a combination of different layers to come up with ur architecture. You can explore the different torch.nn layers here. It is indeed a long topic if we have to discuss the different layers and their functionings here, so I encourage you to take a look at this link to read more about them. In other not to get things complicated, I will share my architecture(after some improvements) with you. Feel free to modify and explore.

import torch
import torch.nn as nn
import torch.nn.functional as Fclass Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, 3)
        self.conv2 = nn.Conv2d(32, 64, 3)
        self.conv3 = nn.Conv2d(64, 128, 3)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(128 * 2 * 2, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 32)
        self.fc4 = nn.Linear(32, 10)
        self.dropout1 = nn.Dropout(p=0.2, inplace=False)    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.dropout1(x)
        x = self.pool(F.relu(self.conv2(x)))
        x = self.dropout1(x)
        x = self.pool(F.relu(self.conv3(x)))
        x = self.dropout1(x)
        x = x.view(-1, 128 * 2 * 2)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.fc4(x) #output layer
        
        return x

Now let’s instantiate the model class.

model = Net()

I’m sure you must have heard about GPU. If not still cool. GPU(Graphics Processing Unit) is mostly used in gaming computers to provide high video processing power. We want to leverage this powerful resource for training of our model. It is not actually a must to use GPU. CPU works, just that it is much slower and will take more time to train and test. Taking that much time may also play on the performance of the model.

So we will check if GPU is present on the current machine running the code, otherwise, we use CPU.

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model.to(device)

We now specify our loss function. Neural networks pretrained using stochastic gradient descent, thus we have to choose a loss function, which will assist in optimization. Is that confusing? If so, I encourage you to take a look at this post where loss functions are discussed. Moreso, you can take a look at some other loss functions you can use here.

criterion = nn.CrossEntropyLoss()# Stochastic gradient descent: to perform parameter update for each training sample
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

Now we can proceed with the training of the model.

epoch_losses = [] # using this to record the training loss so that we can plot it against the epochmodel.train()
for epoch in range(20):
    running_loss = 0.0
    saved_loss = 0.0    for i, data in enumerate(train_loader, 0):
        # get inputs and labels and convert to appropriate device
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)
        
        # zero the parameter gradients
        optimizer.zero_grad()        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        # print stats
        running_loss += loss.item()
        if i % 2000 == 1999:  # print every 2000 mini-batches
            print('%d, %5d| loss: %.3f' %(epoch+1, i+1, running_loss/2000))
            saved_loss = running_loss
            running_loss = 0.0
    epoch_losses.append(saved_loss/10000)
print('Training done!')  # print when finished training

Please feel free to spend some time above to see on the code above in case it doesn’t seem very clear at first sight.

Let us now plot a graph showing our training loss and epoch.

epochs = range(1,21)plt.plot(epochs, epoch_losses, 'g', label='Training loss')
plt.title('Trainingloss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

Let me show you a sample of how the curve should look like:

Notice how the curve approaches zero. The lower the loss, the better the model can predict. Normally, the lowest point on the curve is where the model can predict well. If the curve starts increasing back, it means that our model is overfitting. Usually, in those cases where the curve starts increasing back, we adjust the number of epochs so that our model trains just for the number of epochs where the curse is lowest. That way, we can get some good performance(accuracy) for our model.

For my case, I trained just for 20 epochs. From the look of my curve, it shows that there is a possibility of the curve going lower(loss reducing), thus increasing the model performance. You can try to increase the number of epochs on your own. What do you think?

Test the model

Now that the model is trained, it is time for us to test. We will be using our previously created test_loader. We want to measure the accuracy of our model.

total = 0
correct = 0model.eval() # out our model in evaluation modewith torch.no_grad():
    for data in test_loader:
        images, labels = data
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
print('Accuracy: %d %%' % (100 * correct / total))

From my own part, after running the test, I got an accuracy of 75%. This is not a bad accuracy. But it can sure be improved.

If you want to test prediction on particular images, check out the blow snippet.

dataiter = iter(testloader)
images, labels = dataiter.next()print('Truth: ', ' '.join('%5s' % classes[labels[j]] for j in range(5)))
outputs = net(images)# Output prediction
_, predicted = torch.max(outputs, 1)
print('Predicted: ', ' '.join('%5s' % classes[predicted[j]] for j in range(5)))

Now you can conveniently save the model.

checkpoint = {'model': model,
            'state_dict': model.state_dict(),
            'optimizer' : optimizer.state_dict()}torch.save(checkpoint, 'checkpoint.pth')

If you want to load the model and re use somewhere else, you can use the torch.load function. It takes as a param the model checkpoint file.

And that’s it…

Extra

I will just like to give you a clue if you would like to deploy the model so that it could be used as a service.

You can provision a cloud server in which you will load your model, and you have a simple RESTFul flask or fastapi API that receives images and then loads the image to your model, gets the prediction, and sends back the response to the user.

Note that the images may not be of the same sizes, so you have to do some image resize so that your model can handle the image, if not the model will not behave properly.

Hopefully, you don’t see the deep learning stuff very weird again.

Thank you very much for reading.

PART 2

You can read the part part Deep Learning: Loading and Operationalizing Our Model on here.