WAI102 — Image Classification

Published in

Warwick Artificial Intelligence

16 min readNov 30, 2021

Hey everyone, welcome to the second part of WAI102! In this session, we’ll be using PyTorch to solve the Image Classification task we discussed in the first session.

We’ll start off by implementing a basic architecture, then we’ll take steps to analyse the accuracy of our model and finally we will try and understand it.

The finished code for this guide can be found here.

Requirements 📰

All that you need to complete this article is:

A basic understanding of Python
1 hour of your time

Setting up your environment 🌐

Step 1: This project uses Python 3.7. If you have already installed Python 3.7 and are able to use Python and pip from the command line/terminal then please skip this step.

Install python by clicking here, scrolling all the way down to the files section and downloading the installer appropriate for your system as shown below.

In the installation wizard, make sure “Install launcher for all users” and “Add Python 3.7 to PATH” are both checked. Follow the instructions on the wizard and once complete, we can check the installation was successful by running the following 2 commands in the command prompt/terminal:

If you get the same output as above (except for the path in the pip output), then you’re good to go to the next step.

Step 2: Now we have Python installed we will install the required libraries. Luckily, we can install all the libraries we need with a single line in command prompt. Navigate to this website, scroll down and you should be presented with a table of options. Select the following options:

Your OS ← [Whatever operating system you have]
Package ← Pip
Compute Platform ← CPU

After selecting your options, you’ll be given a command in the Run this Command section. Copy paste this into the terminal and the required libraries will be installed.

Step 3: We have one more library to install called matplotlib . This will be used to plot our results on a nice looking graph. To install this library, copy+paste the following command in the terminal:

pip3 install matplotlib

Creating Our Model

We talked about the Image Classification problem last session – given a set of labels and images, assign each image the correct label.

Solution to an image classification task

We looked at the MNIST dataset - a dataset of 28x28 grayscale images of handwritten digits.

Examples of images from the dataset, with their respective label the drawing represents (the actual number)

We’re going to look at how we can accurately classify these images with the correct labels by applying our knowledge of neural networks that we learnt in the first session.

Creating The Dataset 🎥

Let’s begin by importing some of the required libraries:

import torch
import torchvision
import matplotlib.pyplot as plt
import numpy as np

Next, we want to download the MNIST dataset to our computer so we can work with it. There are two parts to the dataset:

Train - this part of the dataset is used to train our model to make it more accurate at solving our problem
Test - we use this section of the dataset to evaluate how accurate our model is. You may be asking “Why don’t we just evaluate the dataset using the same data we used to train it”? We want to test our model on new data that it has not seen before, otherwise it might just “memorise” the solution

To download these parts, insert the following lines into your code.

from torchvision.transforms import ToTensortrain_dataset = torchvision.datasets.MNIST(root='./', train=True, download=True, transform=ToTensor())test_dataset = torchvision.datasets.MNIST(root='./', train=False, download=True, transform=ToTensor())

We specify root='./' to specify that we want the datasets downloaded to the same folder as our code.
We use train=True and train=False to specify which dataset is the train one, and which isn’t
download=True tells our code to download the dataset from the internet. Note, if the dataset is downloaded it will not be downloaded every time the code is re-run
transform=ToTensor() tells our code to transform our data by making it into tensors – knowing what tensors are isn’t required for this, but if you’d like to read more about them click here

Note that on specific versions of PyTorch, downloading the dataset will raise a warning shown below (fixed in this PR). This warning is harmless and doesn’t affect us, so we can ignore it.

Useful warning

Now that we have the data downloaded, we’d like to modify it slightly to allow our model to train more efficiently.

With the structure we have right now, to train our model we would do the following; we’d pass in each image to our model individually, calculate how “wrong” our model is by calculating the loss. Next, we’d update all the parameters of the network to make the model more accurate.

However, this approach is inefficient as we’re having to repeat this entire process individually for every example in our dataset. If we have hundreds of thousands of examples in our dataset, this would take a very long time.

What we can instead do is specify a batch size to equal some number n. The batch size is the number of samples that will be passed through to the network at one time.

Increasing the batch size will decrease the time it takes to train our model using a dataset as our machine may be able to process more than one single sample at a time, however it will likely slightly decrease the accuracy, so there is a tradeoff to be made.

Inserting the below code will create two variables that split our data into these batches of size 64 for both parts of the dataset.

batch_size = 64train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size)test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size)

Before we train our model, let’s take a look at some of the images in the dataset. This will visualise 12 images from our dataset. Don’t worry how exactly the code works, it’s not too important.

# Create a grid of 12 examples
figure = plt.figure(figsize=(8, 8))
cols, rows = 4, 3for i in range(1, cols * rows + 1):
    sample_idx = torch.randint(len(train_dataset), size=(1,)).item()
    img, label = train_dataset[sample_idx] # Get an examples img and respective label    # Plot each image captioned by its respective label
    figure.add_subplot(rows, cols, i)
    plt.title(label)
    plt.axis("off")
    plt.imshow(img.squeeze(0), cmap="gray")
plt.show()

Defining our Model 💽

Now we’re getting to the fun bit. We’re going to define our neural network: it’s architecture and how it processes inputs.

Below is a visualisation of the architecture of our model. There are 28 x 28 = 784 neurons in the input layer, since the size of the images is 28x28 and we’re resizing this to a single column. Since we’re predicting 10 labels (numbers 0–9), we’ll have 10 neurons in our output layer.

Below is the code which implements the above architecture:

from torch import nn


# Define 1-layer network
class OneLayerNN(nn.Module):
    def __init__(self):
        super(OneLayerNN, self).__init__()
        self.flatten = nn.Flatten()  # Flatten 28x28 image to one dimensional input
        self.linear = nn.Linear(28 * 28, 10)

    def forward(self, input):
        x = self.flatten(input)
        y = self.linear(x)
        return y

Our code has two main parts to it:

In the __init__ function, we initialise the model and define its functionality, what it can do
In the forward function, we tell our model what to do with its inputs

Let’s break down each part of the above:

__init__ function : we first call super(... to initialise the model. We then define the functions the model is capable of performing. First, we define self.flatten = nn.Flatten() which is a function which flattens the input 28x28 image into a one dimensional matrix. We then define self.linear = nn.Linear(28*28, 10) which we use for the forward pass between the input layer and the output layer. Note the arguments are the sizes of the input and output layer respectively.

forward function : given the parameter input which is the input to the neural network, we define what is done to this input to produce the output(s). Since this input will be one of our 28x28 images, we first use the flatten function to reshape the input to be one dimensional so it will match our input layer. We then pass this through our network by using the linear layer, and return the output.

Defining some variables 🖱

We’ve got a few more things to define that relate to our model.

model = OneLayerNN()lr = 1e-3
epochs = 5
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=lr)

Let’s look at what each of the above lines does:

model = OneLayerNN() sets our variable model to be our model we defined earlier
lr = 1e-3 defines what is known as our learning rate, which specifies how significantly our model should update its weights and biases. We’ll be covering this in our next session, but if you’d like to read more about it this is a good article
epochs = 5 specifies how many times we should train our model over our dataset. With 5, that means our model will be exposed to each example in the dataset 5 times.
loss_fn = nn.CrossEntropyLoss() defines our loss function - how we measure how “wrong” our model is
optimizer = torch.optim.SGD(model.parameters(), lr=lr) specifies which optimisation algorithm we should use to update the models parameters

Training our Model 💡

Now we’ve got everything setup, let’s write a function which will train our model.

# Function to train our model
def train_model(dataloader, model, epochs, loss_fn, optimizer):    for epoch in range(epochs):
        for (images, labels) in dataloader:            # Compute prediction and loss
            pred = model(images)
            loss = loss_fn(pred, labels)            # Backpropagation
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
train_model(train_loader, model, epochs, loss_fn, optimizer)

Let’s break down what happens in this function.

We iterate over the whole dataset epochs number of times.

In each iteration, we iterate over our dataloader - each iteration over dataloaderwill give us a batch of 64 images and labels. We put these images into our model as an input, and get the models predictions. We then compute the loss by comparing the predictions against the actual labels.

Now we have the loss, we can then backpropagate to update the parameters of the network. The following section may not make 100% sense initially, we’re going to cover backpropagation in a lot more detail in session 3. We first call optimizer.zero_grad() to reset the gradients - if we didn’t call this the parameters will be influenced by the loss from other batches. Next, we call loss.backward() calculates the gradients dloss/dx for every parameter x in the network. We call optimizer.step() to update all the parameters.

Finally, we call the train_model function we just made to train the model.

Congratulations! You’ve just implemented and trained your first neural network!

Analysing Our Model

We’ve implemented our model, but at this point we really have no idea how well it works.

Ideally we’d like to know how accurate the model is overall, what numbers it is good/bad at classifying and which numbers it gets mixed up frequently.

Looking at the Loss🔬

First of all, we’re going to visualise how the accuracy of our model changes over time by plotting the model’s loss as it trains.

Below, I’ve rewritten our train_model function:

At the start of every epoch, the number of the epoch is printed
Every 100 batches per epoch , we print that batches’ loss and store it in a list
After our model has finished training, we visualise the losses we stored earlier to show how the loss changed over time

# Function to train our model
def train_model(dataloader, model, epochs, loss_fn, optimizer):    # Variables for visualising loss over time
    y_loss = []
    size = len(dataloader.dataset)    for t in range(epochs):
        print(f"Epoch {t}\n-------------------------------")        for batch, (images, labels) in enumerate(dataloader):            # Compute prediction and loss
            pred = model(images)
            loss = loss_fn(pred, labels)            # Backpropagation
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()            if batch % 100 == 0:
                loss, current = loss.item(), batch * len(images)
                print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")
                y_loss.append(loss)
    # Plot loss over time
    plt.plot(range(len(y_loss)), y_loss)
    plt.title('Loss over time')
    plt.xlabel('Epoch_batch')
    plt.ylabel('Loss')    ax = plt.gca()
    ax.axes.xaxis.set_visible(False)
    plt.show()

What this will give us is a graph that looks like the below.

We can see that the model rapidly decreases its loss at first, however as time goes on the rate at quick it improves decreases. If we increase the number of epochs to 20, we get a graph like the below:

Notice how each epoch is easily identified as they all have roughly the same shape. This is because the model is evaluating the same example every epoch for calculating these losses.

If we want our model to be exposed to batches in a random order rather than the same everytime, we can modify our train_loader and test_loader variables we defined earlier by adding shuffle=True to both:

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=True)

Adding shuffle will make our model more robust and generalised as the order of the batches won’t be the same everytime. Below is a visualisation of the loss over 20 epochs with shuffle:

Loss over time for 20 epochs with shuffle=True

Visualising Predictions 🧪

We might also be interested in seeing some examples of what our model predicts for a set of given images.

In the below code, we get random images from out dataset. We pass each image into our model to get its prediction, then we plot each image with the models prediction.

# Test model on some examples
figure = plt.figure(figsize=(8, 8))
cols, rows = 4, 3
for i in range(1, cols * rows + 1):
    sample_idx = torch.randint(len(test_dataset), size=(1,)).item()
    img, label = test_dataset[sample_idx]
    figure.add_subplot(rows, cols, i)

    pred = model(img)
    plt.title(torch.argmax(pred).numpy())
    plt.axis("off")
    plt.imshow(img.squeeze(0), cmap="gray")
plt.show()

This will give us a graph something like the below. Viewing your models predictions can sometimes help you understand its reasoning better. For example, in the below we can see that it incorrectly predicts 4 when shown a picture of a 9.

Images from dataset with respective predictions from model

Test Dataset 🥼

Earlier, we discussed how we split our dataset into two parts such that we can evaluate our model on data that it hasn’t seen before.

A big problem in machine learning is overfitting - when our model fits its training data too much, such that its accuracy on new data is low.

For example, lets say we want to train a model to classify whether a given point on a 2D plane is more likely to be a red or blue dot. Below shows a visualisation of two trained models - ones decision boundary (the line which divides its red/blue prediction) is show with the green line, the other with the black line.

The green line represents an overfitted model, the black line a generalised model

Whilst the green line is more accurate for the given data, it is more likely to have a higher error rate for unseen data.

How do we tell whether our model is overfitting? A good way is to regularly evaluate it on our test dataset. Recall we don’t train our model on the test dataset, so our model will have never seen any of the examples in it. Ideally, the accuracy between the train and test dataset would be similar. If the accuracy of the model on the train dataset is much higher then on the test, it is likely that our model is overfitting.

Let’s first create a function to evalute our model over the entire test dataset. This function will iterate over the entire test dataset, sum up the cumulative loss and correct predictions, and calculate the average loss and accuracy. Finally, we’ll return the average loss. Make sure to place this function above your train_modelfunction.

Note, the with torch.no_grad() tells our model not to calculate the gradients required for backpropagation. We don’t these gradients because we’re not going to update the parameters of the model because we’re evaluating it.

def test_loop(test_dataloader, model, loss_fn):
    size = len(test_dataloader.dataset)
    num_batches = len(test_dataloader)
    test_loss, correct = 0, 0

    with torch.no_grad():
        for images, labels in test_dataloader:
            pred = model(images)
            test_loss += loss_fn(pred, labels).item()
            correct += (pred.argmax(1) == labels).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

    return test_loss

Now let’s modify our train_model(...) function to call our test_loop(...) function at the end of every epoch, then we’ll store the test loss and plot it against the train loss.

import math

# Function to train our model
def train_model(dataloader, model, epochs, loss_fn, optimizer):
    # Variables for visualising loss over time
    y_loss = []
    y_test_loss = []

    size = len(dataloader.dataset)
    for t in range(epochs):
        print(f"Epoch {t}\n-------------------------------")
        for batch, (images, labels) in enumerate(dataloader):
            # Compute prediction and loss
            pred = model(images)
            loss = loss_fn(pred, labels)
            # Backpropagation
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            if batch % 100 == 0:
                loss, current = loss.item(), batch * len(images)
                y_loss.append(loss)
                print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

        test_loss = test_loop(test_loader, model, loss_fn)
        y_test_loss.append(test_loss)

    # Plot loss over time
    plt.plot(range(len(y_loss)), y_loss)
    plt.plot([x * (math.floor(len(dataloader) / 100) + 1) for x in range(epochs)], y_test_loss)
    plt.title('Loss over time')
    plt.xlabel('Epoch_batch')
    plt.ylabel('Loss')
    ax = plt.gca()
    ax.axes.xaxis.set_visible(False)
    plt.show()

Running this code will give us a graph like the below, visualising the train loss against the test loss over 20 epochs:

Train (blue) vs. test (orange) loss over time

We can see that, apart from the first example, the test and train loss are very similar which implies that our model isn’t overfitting.

If the train loss were to be significantly higher than the test loss, then our model is clearly not generalising to unseen examples and it likely overfitting. For example, in the below figure the loss of the training set continues to decrease whilst the test set doesn’t - it actually increases.

Example train and test loss for training process

Confusion Matrix 👩‍🔬

Despite having evaluated our model in a few diffierent ways, we still find it difficult to truly understand our model and its shortcomings. We don’t know which numbers it is very good at predicting, and which numbers it often gets mixed up.

A confusion matrix is a useful tool to visualise performance for classification problems. Each row of the matrix represents the instances in the actual class while each column represents the instances in a predicted class.

For example, let’s say we trained a model to predict between four animals: a lion, fish, monkey or elephant. The below confusion matrix visualises the number of times the model predicted each animal, against what animal the model should have predicted. We can see that the model incorrectly predicted “Monkey” when it should have predicted “Elephant” 3 times. However the model correctly predicted “Elephant” when it should have been “Elephant” 25 times.

Confusion matrix for animal classificaiton model

Let’s say we trained our model again in a different way, and get a confusion matrix like the below. Here, we can see that the model is classifying lions as monkeys:

Confusion matrix for (incorrect) animal classificaiton model

To visualise this matrix, first of all we need calculate the predictions. The below function iterates through our test dataset, and for every example computes the prediction and stores it in the appropriate cell in the 2D matrix. We then call the plot_confusion_matrix(...) function with our created matrix, which we’ll define after.

def generate_confusion_matrix(test_dataloader, model):
    confusion_matrix = np.zeros((10, 10))
    with torch.no_grad():
        for images, labels in test_dataloader:
            preds = model(images)
            labels = labels

            for x in range(len(preds)):
                pred = torch.argmax(preds[x]).numpy()
                confusion_matrix[labels[x]][pred] += 1

    plot_confusion_matrix(confusion_matrix)

Given some matrix, we first setup the figure in the correct format. We first then plot a standard confusion matrix showing the number of predictions for each class for every true label. We then plot a second graph, this time showing the proporting rather than the true number of predictions.

def plot_confusion_matrix(confusion_matrix):
    # Absolute predictions
    fig, ax = plt.subplots(1)

    ax.set_title('No. Predictions Confusion Matrix')
    ax.matshow(confusion_matrix)
    ax.set_xticks(np.arange(10))
    ax.set_yticks(np.arange(10))
    plt.show()

    # Percentage predictions
    for i in range(10):
        totalPredicted = sum(confusion_matrix[:, i])
        if totalPredicted == 0:
            print("a")
            continue

        confusion_matrix[:, i] = confusion_matrix[:, i] / totalPredicted

    fig, ax = plt.subplots(1)

    ax.set_title('Percentage Predictions Confusion Matrix')
    ax.matshow(confusion_matrix)
    ax.set_xticks(np.arange(10))
    ax.set_yticks(np.arange(10))

    plt.show()
generate_confusion_matrix(test_loader, model)

Using this, we’ll get graphs like the below. From hoving over the graph in the pop-up (which you can’t see here), we can see that our model is best at prediction 1 with 93% accuracy, and worst at predicting 8 with 82% accuracy. We can also see that our model often confuses 4 and 9 , likely due to their similar shapes.

Confusion matrix using our model, this time dislaying proportion of predictions per cell

Saving+Loading Models👩‍🔬

By now, you’re probably bored of having to wait ~5 minutes to train your model everytime you want to analyse its outputs. Luckily, using PyTorch it’s possible to save your trained model, then load it so you don’t need to go through the entire training process again.

First, create a folder within your working directory called modelsNow, let’s modify the code where we call the train_model(...) function. We need to specify a path where we’ll save and load our models from, and a variable to specify whether we want to train our model or load it.

# Load model if chosen, otherwise train
load = False
model_path = './models/model_weights.pth'
if not load:

    # Train and save model
    train_model(train_loader, model, epochs, loss_fn, optimizer)
    torch.save(model.state_dict(), model_path)

else:
    model.load_state_dict(torch.load(model_path))

Better Architectures

Of course, we can change the architecture of our model to achieve a more accurate classification score.

Modifying the architecture of our model is simple, all we have to do in this case is change our OneLayerNN(...) class. We will write any additional layers in the __init__(...) function, then update the forward(...) function to state how the outputs are computed differently.

Below is a modified version of the class, where we’ve added a hidden layer with 40 neurons. Note the arguments a, bin nn.Linear(a, b) match the number of neurons in the input / output respectively, so when adding linear layers make sure consecutive layer’s inputs+outputs match in size.

class OneLayerNN(nn.Module):
    def __init__(self):
        super(OneLayerNN, self).__init__()
        self.flatten = nn.Flatten()  # Flatten 28x28 image to one           dimensional input
        self.linear_a = nn.Linear(28 * 28, 40)
        self.linear_b = nn.Linear(40, 10)

    def forward(self, input):
        x = self.flatten(input)
        x = self.linear_a(x)
        x = self.linear_b(x)
        return x

As long as the layers match each other, we could add as many hidden layers as we want. Experiment with different architectures, see what works well.

Conclusion

Hopefully by now you have a better understanding of neural networks and implementing them using PyTorch. We have one more session in this course which will be in-person in OC1.05 on December 7th at 6pm.

Remember we’ll be giving away up to ten WAI t-shirts to the most committed to this course!

If you’re not already involved in our society, check out the following links: