Image classification with PyTorch

Arun Purakkatt
Analytics Vidhya
Published in
5 min readAug 8, 2020

CIFAR 10 Data set using logistic regression

CIFAR-10 data set(source:https://www.cs.toronto.edu/~kriz/cifar.html)

In my previous posts we have gone through

  1. Deep Learning — Artificial Neural Network(ANN)
  2. Tensors — Basics of pytorch programming
  3. Linear Regression with PyTorch

Let us try to solve image classification of CIFAR-10 data set with Logistic regression.

Step 1 : Import necessary libraries & Explore the data set

We are importing the necessary libraries pandas , numpy , matplotlib ,torch ,torchvision. With basic EDA we could infer that CIFAR-10 data set contains 10 classes of image, with training data set size of 50000 images , test data set size of 10000.Each image is of [3 x 32 x 32 ]. Which represents 3 channels RGB,32 x 32 pixel size.

#Explore CIFAR Data set
dataset = CIFAR10(root='data/', download=True, transform=ToTensor())
test_dataset = CIFAR10(root='data/', train=False, transform=ToTensor())
#size of training data
dataset_size = len(dataset)
dataset_size
#size of test data
test_dataset_size = len(test_dataset)
test_dataset_size
#number of classes in the data set
classes = dataset.classes
classes

Visualizing a sample image and the size of the sample image.

#Let us understand the size of one image
img, label = dataset[0]
img_shape = img.shape
img_shape
#Let us look at a sample image
img, label = dataset[0]
plt.imshow(img.permute((1, 2, 0)))
print('Label (numeric):', label)
print('Label (textual):', classes[label])

As this is a 3 channel RGB image Pytorch expects the channels as first dimension where as matplotlib expects as last dimension of the image.Here .permute tesnor method is used to shift channels to last dimesnion

Label (numeric): 6
Label (textual): frog

Step 2 : Prepare data for training

We using training set , validation set , Test set. Why we need them ?

Training set : used to train our model,computing loss & adjust weights Validation set : To evaluate the model with hyper parameters & pick the best model during training. we are using 10% of training data as validation set Test data set : Used to compare different models & report the final accuracy.

We are using the random_split from pytorch for creating train_ds,val_ds. torch.manual_seed(43) is set for reproducing the results.

#validation set size 5000 ie 10% 
torch.manual_seed(43)
val_size = 5000
train_size = len(dataset) - val_size
#creating training & validation set using random_split
train_ds, val_ds = random_split(dataset, [train_size, val_size])
len(train_ds), len(val_ds)

We are using the data loader as we used in our previous example , with a batch size of 128. To visualize our data we are using make_grid helper function from torch vision.

#Creating data loader to load data in batches
batch_size=128
train_loader = DataLoader(train_ds, batch_size, shuffle=True, num_workers=4, pin_memory=True)
val_loader = DataLoader(val_ds, batch_size*2, num_workers=4, pin_memory=True)
test_loader = DataLoader(test_dataset, batch_size*2, num_workers=4, pin_memory=True)
#visualize data using make_grid helper function from torch vision
for images, _ in train_loader:
print('images.shape:', images.shape)
plt.figure(figsize=(16,8))
plt.axis('off')
plt.imshow(make_grid(images, nrow=16).permute((1, 2, 0)))
break
images.shape: torch.Size([128, 3, 32, 32])

Step 3 : Creating a base model class & Training on GPU

We are creating a class ImageClassificationBase which inherits from nn.module.This does not contain model architecture ie __init__ & __forward__ methods.

class ImageClassificationBase(nn.Module):
def training_step(self, batch):
images, labels = batch
out = self(images) # Generate predictions
loss = F.cross_entropy(out, labels) # Calculate loss
return loss

def validation_step(self, batch):
images, labels = batch
out = self(images) # Generate predictions
loss = F.cross_entropy(out, labels) # Calculate loss
acc = accuracy(out, labels) # Calculate accuracy
return {'val_loss': loss, 'val_acc': acc}

def validation_epoch_end(self, outputs):
batch_losses = [x['val_loss'] for x in outputs]
epoch_loss = torch.stack(batch_losses).mean() # Combine losses
batch_accs = [x['val_acc'] for x in outputs]
epoch_acc = torch.stack(batch_accs).mean() # Combine accuracies
return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}

def epoch_end(self, epoch, result):
print("Epoch [{}], val_loss: {:.4f}, val_acc: {:.4f}".format(epoch, result['val_loss'], result['val_acc']))

All of us have heard that to train deep learning models CPU is not enough but we need GPU. How can we train our model on GPU ?

We are checking the device availability , it will show GPU/CPU based upon your system set up. We are creating helper function to_device to move tensors to the choosen device.A class devicedataloader() to move data to device.We will need helper functions to plot losses and accuracy.

device = get_default_device()
device
def to_device(data, device):
"""Move tensor(s) to chosen device"""
if isinstance(data, (list,tuple)):
return [to_device(x, device) for x in data]
return data.to(device, non_blocking=True)
class DeviceDataLoader():
"""Wrap a dataloader to move data to a device"""
def __init__(self, dl, device):
self.dl = dl
self.device = device

def __iter__(self):
"""Yield a batch of data after moving it to device"""
for b in self.dl:
yield to_device(b, self.device)
def __len__(self):
"""Number of batches"""
return len(self.dl)

Moving data loaders to appropriate device.

train_loader = DeviceDataLoader(train_loader, device)
val_loader = DeviceDataLoader(val_loader, device)
test_loader = DeviceDataLoader(test_loader, device)

Step 3 : Training Model

Similar to linear regression , difference is that we have validation phase as well. input size is of 3x32x32 , output size is 10. We have used 2 hidden layers.The neural network architecture will look like 2048 x 1650 x 512 x 138 x 10 . Images are flattened into vectors and applying layers and activation function and finally getting predictions using output layer. Relu is used as activation function here.

class CIFAR10Model(ImageClassificationBase):
def __init__(self):
super().__init__()
self.linear1 = nn.Linear(input_size, 1650)
# hidden layers
self.linear2 = nn.Linear(1650, 512)
self.linear3 = nn.Linear(512, 138)
# output layer
self.linear4 = nn.Linear(138, output_size)

def forward(self, xb):
# Flatten images into vectors
out = xb.view(xb.size(0), -1)
# Apply layers & activation functions
out = self.linear1(out)
# Apply activation function
out = F.relu(out)
# Get intermediate outputs using hidden layer 2
out = self.linear2(out)
# Apply activation function
out = F.relu(out)
# Get predictions using output layer
out = self.linear3(out)
# Apply activation function
out = F.relu(out)
# Get predictions using output layer
out = self.linear4(out)
# Apply activation function
out = F.relu(out)
return out

training the model using fit function to reduce the loss and improve accuracy.Here we are trying out different learning rate and epochs. With learning rate of 0.001 and epochs 25 we get the best accuracy.

history += fit(10, 0.05, model, train_loader, val_loader)
history += fit(8, 0.005, model, train_loader, val_loader)
history += fit(7, 0.01, model, train_loader, val_loader)
history += fit(4, 0.001, model, train_loader, val_loader)
history += fit(10, 0.0001, model, train_loader, val_loader)
#since 0.001 gives best accuracy, will go with that
history += fit(25, 0.001, model, train_loader, val_loader)
accuracy & loss with no of epochs

Step 4 : Recording results & saving the model

We can save the model with below code , which can be used to predict.

evaluate(model, test_loader)
arch = '4 layers (1650,512,138,10)'
lrs = [0.5,0.01,0.05,0.001,0.1]
epochs = [5,7,8,10,4, 25]
test_acc = 0.54
test_loss = 1.30
torch.save(model.state_dict(), 'cifar10-feedforward.pth')

--

--