Pipeline for every PyTorch Image Classification Problem / Training Model

→ Guide to Training PyTorch Image Classification Models

5 min readApr 6, 2024

In the first part of this series(link), I discussed how to process image data and convert it into a format that PyTorch expects.
In this part, I will train a custom image classification model.

I am not going to talk about how to obtain , process and analyze the image data here. If you are interested in learning about these steps, I strongly recommend reading the first part of this pipeline (link)

For every image classification task , I follow these 6 main steps and in this article I am going to discuss about last 3 parts.

Creating Dataset (first part)
Visualization of Example Images (first part)
Visualization of Class Distribution (first part)
Create functions for training model (this article)
Create Model (this article)
Train Model (this article)

Important Note : train_set and validation_set are data loaders that wrap around iterable datasets and I created them in previous article. I will use these two data loaders for training.

4. Create Functions for Training Model

To train a model with PyTorch, you need to create your own functions. PyTorch does not provide a function similar to “.fit()” in Tensorlow . I am going to write 2 function for training the model , one for training , one for validation.

In these functions :

the Loss is calculated as the average batch loss .
the Accuracy is calculated as the percentage of correct predictions out of the total number of samples.

# I am going to add accuracies to these lists and I will use them outside of this function 
train_accuracies=[]
validation_accuracies=[]

# Function for training
def train(dataloader, model, loss_fn, optimizer, epoch):
    
    size = len(dataloader.dataset) # total number of images inside of loader
    num_batches = len(dataloader) # number of batches
    
    model.train()

    train_loss, correct = 0, 0
    

    for batch, (X, y) in enumerate(dataloader):
        # move X and y to GPU for faster training
        X, y = X.to(device), y.to(device) 

        # make prediction 
        pred = model(X)
        # calculate loss 
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward() # compute parameters gradients
        optimizer.step() # update parameters
        optimizer.zero_grad() #  reset the gradients of all parameters

        # Update training loss
        train_loss += loss.item() # item() method extracts the loss’s value as a Python float

        # Calculate training accuracy
        correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    
    # loss and accuracy
    train_loss = train_loss / num_batches
    accuracy = 100 * correct / size
    
    # use this accuracy list for plotting accuracy with matplotlib
    train_accuracies.append(accuracy)

    # Print training accuracy and loss at the end of epoch
    print(f" Training Accuracy: {accuracy:.2f}%, Training Loss: {train_loss:.4f}")

# function for validation 
def validation(dataloader, model, loss_fn,t):
    
    size = len(dataloader.dataset) # total number of images inside of loader
    num_batches = len(dataloader) # number of batches
    
    validation_loss, correct = 0, 0
    
    # sets the PyTorch model to evaluation mode, it will disable dropout layer
    model.eval()
    
    with torch.no_grad(): #  disable gradient calculation
        for X, y in dataloader:
            
            # move X and y to GPU for faster training
            X, y = X.to(device), y.to(device)
            pred = model(X) # make prediction
            validation_loss += loss_fn(pred, y).item() 
            
            # if prediction is correct add 1 to correct variable.
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    
    # loss and accuracy
    validation_loss /= num_batches
    accuracy = 100 * correct / size

    validation_accuracies.append(accuracy)

    # Print test accuracy and loss at the end of epoch
    print(f" Validation Accuracy: {accuracy:.2f}%, Validation Loss: {validation_loss:.4f}")

I will use this two functions when training the model .

5. Create Model

Below, I have explained the meanings of all the layers, what they do, the output dimensions, and how the model works. I strongly recommend you to read the comment blocks.

import torch
# if GPU is available , use it while training 
device = "cuda" if torch.cuda.is_available() else "cpu" 
device

import torch
import torch.nn as nn


class SimpleCNN(nn.Module):
    def __init__(self, num_classes=9):
        super(SimpleCNN, self).__init__()
        
        # image size is --> (3,180,180) 
        
        # convolutional layer with 32 filter ,input dimension is 3 because image has 3 channels
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1) 
        # activation function , it adds   introduces non-linearity to the model , thus  it helps to model to learn complex functions .
        self.act1 = nn.ReLU() 
        # it reduces pixel number  (90,90)
        self.pool1 = nn.MaxPool2d(2)
        
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.act2 = nn.ReLU()
        self.pool2 = nn.MaxPool2d(2)  # --> (45,45)
        
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.act3 = nn.ReLU()
        self.pool3 = nn.MaxPool2d(2) # --> (22,22)
        
        # first flatten the channels and then feed them into the fully connected layer. Given the input shape of (128, 22, 22), flattening it results in 128 * 22 * 22.
        self.fc1 = nn.Linear(128 * 22 * 22 , 256)  
        self.act4 = nn.ReLU()
        
        # dropout drops randomly neurons , here %20 of neurons are dropped randomly . It helps to prevent overfitting
        self.dropout=nn.Dropout(p=0.2) 
        
        # The nn.Linear layer with input size 256 and output size 9 represents the output layer of our neural network. 
        # Since we have 9 classes, the output of this layer will be passed through a softmax activation function.
        # (error function  internally applies softmax activation ,you dont need to add it to here)
        # This converts the raw outputs into probabilities, representing the likelihood of each class. 
        # These probabilities are then used to calculate the error during trainin
        self.fc2 = nn.Linear(256, 9) 


    def forward(self, x):
        
        # add outputs on top of each layer and return out in the end
        out = self.pool1(self.act1(self.conv1(x)))
        out = self.pool2(self.act2(self.conv2(out)))
        out = self.pool3(self.act3(self.conv3(out)))
        
        out = out.view(out.size(0), -1)
        
        out = self.act4(self.fc1(out))
        out=self.dropout(out)
        out=self.fc2(out)
        
        return out

# create model
model = SimpleCNN()

model.to(device)

6. Train Model

Training may take long time depending on your GPU

# Loss funciton and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

# epoch number 
epochs = 32

# loop for training model 
for t in range(epochs):
    print(f"Epoch {t+1}")
    train(train_set, model, loss_fn, optimizer,t) 
    validation(validation_set, model, loss_fn,t)
    print("----------------------------")
print("Done!")

Visualization of accuracy

import matplotlib.pyplot as plt 

def visualize(train_accuracies,validation_accuracies):
    epoch_number=len(train_accuracies)

    plt.plot(range(1,epoch_number+1),train_accuracies,'r', label='Training accuracy')
    plt.plot(range(1,epoch_number+1),validation_accuracies,'b', label='Validation accuracy')
    plt.legend()
    plt.xlabel("Epoch Number")
    plt.ylabel("Accuracies")
    plt.grid()

# Remember , this 2 parameters are lists , I created them above and append values to them regularly
visualize(train_accuracies,validation_accuracies)

Result seems good , %86 accuracy without overfitting .