solving CIFAR10 dataset with VGG16 pre-trained architect using Pytorch, validation accuracy over 92%

7 min readApr 10, 2021

CIFAR10 is the subset labeled dataset collected from 80 million tiny images dataset. this dataset is collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.

CIFAR10 in torch package has 60,000 images of 10 labels, with the size of 32x32 pixels. By default, torchvision.datasets.CIFAR10 will separate the dataset into 50,000 images for training and 10,000 images for testing.
VGG16 is a very deep convolutional neural network researched and built by Karen Simonyan & Andrew Zisserman, if you are interested in their work, I highly recommend clicking this link to read about their research.
Transfer learning is a technique reusing the pre-trained model to fit into the developers'/data scientists’ demands. In this case, I reused the VGG16 model to solve the CIFAR10 dataset.

I used Google Collab as the main working environment in this project. The first step is to specify the machine being used to train the model, either cuda or cpu. then I choose the number of epochs, batch size, and learning rate for this training. As mentioned in the introduction, the CIFAR10 has 10 labels, these 10 labels are stored in the classes variables.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
from torchvision import modelsimport numpy as np
import pandas as pdimport matplotlib.pyplot as plt
import seaborn as snsdevice = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)num_epochs = 5
batch_size = 40
learning_rate = 0.001classes = ('plane', 'car' , 'bird',
    'cat', 'deer', 'dog',
    'frog', 'horse', 'ship', 'truck')

Then, I prepared the dataset CIFAR10 to be used in this project with the function transforms.Compose, this function will receive a list of steps that will transform the input data. You can see it as a data pipeline, this pipeline first will resize all the images from CIFAR10 to the size of 224x224, which is the input layer of the VGG16 model, then it will transform the image into the tensor data type for the later steps, finally, it will normalize the pixel value scale down to mean value ~ 0.47 and standard deviation ~ 0.2, and because the images are 3 channels color (Red — Green — Blue) so the inputs of tranforms.Normailize were 2 tuples of 3 float numbers representing for mean-std values pair of 3 color channels respectively.

After specifying the data transforming pipeline, I loaded the CIFAR10 dataset from the torchvision package (the code below). I got the training dataset by assigning the hyper-parameter train True, testing dataset by setting it to False, and both are applied thetransform to the above data pipeline.

transform = transforms.Compose([
    transforms.Resize(size=(224, 224)),
    transforms.ToTensor(),
    transforms.Normalize( 
       (0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010) 
    )
])train_dataset = torchvision.datasets.CIFAR10(
    root= './data', train = True,
    download =True, transform = transform)test_dataset = torchvision.datasets.CIFAR10(
    root= './data', train = False,
    download =True, transform = transform)

The next step in preparing the dataset is to load it into a Python parameter. I assign the batch_size of function torch.untils.data.DataLoader to the batch size, I choose in the first step. I also choose the Shuffle method, it is especially helpful for the training dataset. The n_total_step in my case is 1,250 steps, it is calculated by <total records>/<batch size>, so my case is 50,000/40 = 1,250. it means that in training stage, each epoch my code will execute a loop of 1,250 steps.

train_loader = torch.utils.data.DataLoader(train_dataset
    , batch_size = batch_size
    , shuffle = True)
test_loader = torch.utils.data.DataLoader(test_dataset
    , batch_size = batch_size
    , shuffle = True)n_total_step = len(train_loader)
print(n_total_step)

Here is the important part of this project, I import the vgg16 model from the torchvision.models and choose the pre-trained version. This model has the default output of 1,000 features but in my case, I only need 10 output features. Those 10 output features are calculated by nn.Linear function, you can take a more detailed look yourself by displaying the model variable below. I also encourage you to try with other pre-trained models and experience yourself tunning that model suit your personal problems. You can see more pre-trained models in Pytorch in this link.

I used the CrossEntropyLoss function in torch to calculate the loss value. This function received the predicted y value of n-features and the labels and does the softmax calculation, in my case, I have 10-feature predicted outputs for each image. You can see the mathematics formula of softmax in the below pictures.

Finally, I choose the SGD Stochastic Gradient Descent method as my optimizer, passing the parameter that I want to optimize, which are model.parameters(), apply the learning rate, momentum, and weight_decay hyper-parameters as 0.001, 0.5, and 5e-4 respectively. Feel free to tunning these parameters yourself.

Note: the VGG16 has 10 linear output features, and we do not need to apply the softmax activation function as the last layer of the model, because the softmax is integrated with the nn.CrossEntropyLoss loss function.

model = models.vgg16(pretrained = True)
input_lastLayer = model.classifier[6].in_features
model.classifier[6] = nn.Linear(input_lastLayer,10)
model = model.to(device)criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = learning_rate, momentum=0.9,weight_decay=5e-4)

Training the model, passing the batch of images into the model, the output has the size of (40,10), which 40 is the batch size, 10 is the number of features. Then get the output label by .argmax(axis=1), the output is (40,) which means each image has a 10-feature output and will get the index of the largest value feature. Then getting the loss value with the nn.CrossEntropyLoss() function, then apply the .backward() method to the loss value to get gradient descent after each loop and update model.parameters() by triggering the .step() method of the optimizer, lastly, don’t forget to reset the gradient descent after every single loop with .zero_grad() method.

In my code, every 250 steps of each epoch, I print the loss value and the accuracy on the training dataset. This step consumes a lot of time, about 150 minutes with GPU engine, I strongly advise you to check the resource of torchvision.models package, or do something useful rather than sitting in front of the PC and staring at the screen.

1.for epoch in range(num_epochs):
2.    for i, (imgs , labels) in enumerate(train_loader):
3.        imgs = imgs.to(device)
4.        labels = labels.to(device)
5.
6.        labels_hat = model(imgs)
7.        n_corrects = (labels_hat.argmax(axis=1)==labels).sum().item()
8.        loss_value = criterion(labels_hat, labels)
9.        loss_value.backward()
10.       optimizer.step()
11.       optimizer.zero_grad()
12.       if (i+1) % 250 == 0:
13.           print(f’epoch {epoch+1}/{num_epochs}, step: {i+1}/{n_total_step}: loss = {loss_value:.5f}, acc = {100*(n_corrects/labels.size(0)):.2f}%’)
14.    print()

Finally step is to evaluate the training model on the testing dataset. In each batch of images, we check how many image classes were predicted correctly, get the labels_predictedby calling .argmax(axis=1) on the y_predicted, then counting the corrected predicted labels by (labels_predicted==test_labels_set).sum().item(), labels_predicted==test_labels_set would return a tensor of True or False value, True equals to 1 and False equals to 0, then the .sum() method will count the correct predicted labels, and the .item() method just extracts the value of the 1-dimension tensor. Finally, the number of samples each batch size test_labels_set.size(), is obviously just the batch_size value we specify at the beginning of this article.

with torch.no_grad():    number_corrects = 0
    number_samples = 0    for i, (test_images_set , test_labels_set) in enumerate(test_loader):
        test_images_set = test_images_set.to(device)
        test_labels_set = test_labels_set.to(device)
    
        y_predicted = model(test_images_set)        labels_predicted = y_predicted.argmax(axis = 1)        number_corrects += (labels_predicted==test_labels_set).sum().item()
        number_samples += test_labels_set.size(0)    print(f’Overall accuracy {(number_corrects / number_samples)*100}%’)

Another method to visualize the evaluation test dataset is using a heatmap with the support of theseaborn package. In the code below, I generate a heatmap data frame size of (10,10) with the initial value of 0. The vertical index represents the true labels and the horizontal index represents the predicted value. For example, in the result below, in the dog label, there were 102 images wrongly predicted as the cat label and 858 images were successfully predicted.

heatmap = pd.DataFrame(data=0,index=classes,columns=classes)
with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(batch_size):
            true_label = labels[i].item()
            predicted_label = predicted[i].item()
            heatmap.iloc[true_label,predicted_label] += 1_, ax = plt.subplots(figsize=(10, 8))
ax = sns.heatmap(heatmap, annot=True, fmt=”d”,cmap=”YlGnBu”)
plt.show()

Last but not least, don’t forget to save your model to reuse it later on.

torch.save(
  model, 
  '/path/to/your/drive/or/local/directory/<name_of_the_model>.pth')

So… that is about this project, I am also just a beginner who wants to gain and share knowledge with everyone and I hope you find something useful from my sharings. Machine Learning is a very interesting field, and contains a lot of powerful technique and knowledge requires the learners investing their time on. If this blog helps you with your current studies in AI or if you find any bug in my code or anything that needs to be improved, you’re always welcomed to comment on this post, I would be so glad to read your comments.

solving CIFAR10 dataset with VGG16 pre-trained architect using Pytorch, validation accuracy over 92%

Written by Hien Bui