We are going to use PYTorch and create CNN model step by step. Then we will train the model with training data and evaluate the model with test data.
Import libraries
import torch
Check available device
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device
Output: device(type=’cpu’)
Download MNIST dataset
What is MNIST dataset?
The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning.
The MNIST database contains 60,000 training images and 10,000 testing images.
PyTorch domain libraries provide a number of pre-loaded datasets (such as FashionMNIST, MNIST etc…) that subclass torch.utils.data.Dataset and implement functions specific to the particular data. They can be used to prototype and benchmark your model. In this example we are using MNIST dataset.
Download MNIST dataset in local system
from torchvision import datasets
from torchvision.transforms import ToTensortrain_data = datasets.MNIST(
root = 'data',
train = True,
transform = ToTensor(),
download = True,
)test_data = datasets.MNIST(
root = 'data',
train = False,
transform = ToTensor()
)
Print train_data and test_data size
print(train_data)
Output:
print(test_data)
Output:
print(train_data.data.size())
Output: torch.Size([60000, 28, 28])
print(train_data.targets.size())
Output: torch.Size([60000])
Visualization of MNIST dataset
Plot one train_data
import matplotlib.pyplot as pltplt.imshow(train_data.data[0], cmap='gray')
plt.title('%i' % train_data.targets[0])
plt.show()
Output:
Plot multiple train_data
figure = plt.figure(figsize=(10, 8))
cols, rows = 5, 5
for i in range(1, cols * rows + 1):
sample_idx = torch.randint(len(train_data), size=(1,)).item()
img, label = train_data[sample_idx]
figure.add_subplot(rows, cols, i)
plt.title(label)
plt.axis("off")
plt.imshow(img.squeeze(), cmap="gray")
plt.show()
Output:
Preparing data for training with DataLoaders
The Dataset retrieves our dataset’s features and labels one sample at a time. While training a model, we typically want to pass samples in “minibatches”, reshuffle the data at every epoch to reduce model overfitting, and use Python’s multiprocessing to speed up data retrieval.
DataLoader is an iterable that abstracts this complexity for us in an easy API.
from torch.utils.data import DataLoaderloaders = {
'train' : torch.utils.data.DataLoader(train_data,
batch_size=100,
shuffle=True,
num_workers=1),
'test' : torch.utils.data.DataLoader(test_data,
batch_size=100,
shuffle=True,
num_workers=1),
}
loaders
Define the Convolutional Neural Network model
A Convolutional Neural Network is type of neural network that is used mainly in image processing applications.
Let us create convolution neural network using torch.nn.Module. torch.nn.Module will be base class for all neural network modules. We will use 2 fully convolutional layers, Relu activation function and MaxPooling.
Conv2d: Applies a 2D convolution over an input signal composed of several input planes.
Parameters
in_channels (int) — Number of channels in the input image
out_channels (int) — Number of channels produced by the convolution
kernel_size (int or tuple) — Size of the convolving kernel
stride (int or tuple, optional) — Stride of the convolution. Default: 1
padding (int or tuple, optional) — Zero-padding added to both sides of the input. Default: 0
padding_mode (string, optional) — ‘zeros’, ‘reflect’, ‘replicate’ or ‘circular’. Default: ‘zeros’
dilation (int or tuple, optional) — Spacing between kernel elements. Default: 1
groups (int, optional) — Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional) — If True, adds a learnable bias to the output. Default: True
import torch.nn as nnclass CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__() self.conv1 = nn.Sequential(
nn.Conv2d(
in_channels=1,
out_channels=16,
kernel_size=5,
stride=1,
padding=2,
),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),
)
self.conv2 = nn.Sequential(
nn.Conv2d(16, 32, 5, 1, 2),
nn.ReLU(),
nn.MaxPool2d(2),
) # fully connected layer, output 10 classes
self.out = nn.Linear(32 * 7 * 7, 10) def forward(self, x):
x = self.conv1(x)
x = self.conv2(x) # flatten the output of conv2 to (batch_size, 32 * 7 * 7)
x = x.view(x.size(0), -1)
output = self.out(x)
return output, x # return x for visualization
in_channels=1: because our input is a grayscale image.
Stride: is the number of pixels to pass at a time when sliding the convolutional kernel.
Padding: to preserve exactly the size of the input image, it is useful to add a zero padding on the border of the image.
kernel_size: we need to define a kernel which is a small matrix of size 5 * 5. To perform the convolution operation, we just need to slide the kernel along the image horizontally and vertically and do the dot product of the kernel and the small portion of the image.
The forward() pass defines the way we compute our output using the given layers and functions.
cnn = CNN()
print(cnn)
Output:
Define loss function
loss_func = nn.CrossEntropyLoss()
loss_func
Output: CrossEntropyLoss()
Define a Optimization Function
lr(Learning Rate): Rate at which our model updates the weights in the cells each time back-propagation is done.
from torch import optimoptimizer = optim.Adam(cnn.parameters(), lr = 0.01)
optimizer
Output:
Train the model
Create a function called train() and pass num of epochs, model and data loaders as input parameters.
num_epochs: Number of times our model will go through the entire training dataset
from torch.autograd import Variablenum_epochs = 10def train(num_epochs, cnn, loaders):
cnn.train()
# Train the model
total_step = len(loaders['train'])
for epoch in range(num_epochs):
for i, (images, labels) in enumerate(loaders['train']):
# gives batch data, normalize x when iterate train_loader
b_x = Variable(images) # batch x
b_y = Variable(labels) # batch youtput = cnn(b_x)[0]
loss = loss_func(output, b_y)
# clear gradients for this training step
optimizer.zero_grad()
# backpropagation, compute gradients
loss.backward() # apply gradients
optimizer.step()
if (i+1) % 100 == 0:
print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
.format(epoch + 1, num_epochs, i + 1, total_step, loss.item())) pass
pass
passtrain(num_epochs, cnn, loaders)
Output:
Epoch [1/10], Step [100/600], Loss: 0.0725
Epoch [1/10], Step [200/600], Loss: 0.2758
Epoch [1/10], Step [300/600], Loss: 0.0742
Epoch [1/10], Step [400/600], Loss: 0.0744
Epoch [1/10], Step [500/600], Loss: 0.0035
Epoch [1/10], Step [600/600], Loss: 0.1458
Epoch [2/10], Step [100/600], Loss: 0.0281
Epoch [2/10], Step [200/600], Loss: 0.0584
Epoch [2/10], Step [300/600], Loss: 0.0605
Epoch [2/10], Step [400/600], Loss: 0.1782
Epoch [2/10], Step [500/600], Loss: 0.0324
Epoch [2/10], Step [600/600], Loss: 0.0918
Epoch [3/10], Step [100/600], Loss: 0.0430
Epoch [3/10], Step [200/600], Loss: 0.0368
Epoch [3/10], Step [300/600], Loss: 0.0009
Epoch [3/10], Step [400/600], Loss: 0.0647
Epoch [3/10], Step [500/600], Loss: 0.0370
Epoch [3/10], Step [600/600], Loss: 0.0286
Epoch [4/10], Step [100/600], Loss: 0.0905
Epoch [4/10], Step [200/600], Loss: 0.0638
Epoch [4/10], Step [300/600], Loss: 0.0238
Epoch [4/10], Step [400/600], Loss: 0.0564
Epoch [4/10], Step [500/600], Loss: 0.0117
Epoch [4/10], Step [600/600], Loss: 0.0069
Epoch [5/10], Step [100/600], Loss: 0.0014
Epoch [5/10], Step [200/600], Loss: 0.0449
Epoch [5/10], Step [300/600], Loss: 0.0050
Epoch [5/10], Step [400/600], Loss: 0.0534
Epoch [5/10], Step [500/600], Loss: 0.0100
Epoch [5/10], Step [600/600], Loss: 0.1055
Epoch [6/10], Step [100/600], Loss: 0.1483
Epoch [6/10], Step [200/600], Loss: 0.0018
Epoch [6/10], Step [300/600], Loss: 0.0101
Epoch [6/10], Step [400/600], Loss: 0.0188
Epoch [6/10], Step [500/600], Loss: 0.0239
Epoch [6/10], Step [600/600], Loss: 0.0323
Epoch [7/10], Step [100/600], Loss: 0.0085
Epoch [7/10], Step [200/600], Loss: 0.0767
Epoch [7/10], Step [300/600], Loss: 0.0313
Epoch [7/10], Step [400/600], Loss: 0.0518
Epoch [7/10], Step [500/600], Loss: 0.0098
Epoch [7/10], Step [600/600], Loss: 0.1183
Epoch [8/10], Step [100/600], Loss: 0.1086
Epoch [8/10], Step [200/600], Loss: 0.0024
Epoch [8/10], Step [300/600], Loss: 0.0949
Epoch [8/10], Step [400/600], Loss: 0.0502
Epoch [8/10], Step [500/600], Loss: 0.0689
Epoch [8/10], Step [600/600], Loss: 0.0637
Epoch [9/10], Step [100/600], Loss: 0.0540
Epoch [9/10], Step [200/600], Loss: 0.0826
Epoch [9/10], Step [300/600], Loss: 0.0013
Epoch [9/10], Step [400/600], Loss: 0.0168
Epoch [9/10], Step [500/600], Loss: 0.0046
Epoch [9/10], Step [600/600], Loss: 0.0419
Epoch [10/10], Step [100/600], Loss: 0.0141
Epoch [10/10], Step [200/600], Loss: 0.1218
Epoch [10/10], Step [300/600], Loss: 0.0629
Epoch [10/10], Step [400/600], Loss: 0.0056
Epoch [10/10], Step [500/600], Loss: 0.0661
Epoch [10/10], Step [600/600], Loss: 0.0096
Evaluate the model on test data
We must call model.eval() to set dropout and batch normalization layers to evaluation mode before running inference.
model.train() tells your model that you are training the model. So effectively layers like dropout, batchnorm etc. which behave different on the train and test procedures know what is going on and hence can behave accordingly.
You can call either model.eval() or model.train(mode=False) to tell that you are testing the model.
def test():
# Test the model
cnn.eval() with torch.no_grad():
correct = 0
total = 0
for images, labels in loaders['test']:
test_output, last_layer = cnn(images)
pred_y = torch.max(test_output, 1)[1].data.squeeze()
accuracy = (pred_y == labels).sum().item() / float(labels.size(0))
passprint('Test Accuracy of the model on the 10000 test images: %.2f' % accuracy)
passtest()
Output: Test Accuracy of the model on the 10000 test images: 0.97
Print 10 predictions from test data
sample = next(iter(loaders['test']))
imgs, lbls = sample
…
actual_number = lbls[:10].numpy()
actual_number
Output: array([3, 7, 3, 2, 4, 0, 3, 9, 6, 0])
test_output, last_layer = cnn(imgs[:10])
pred_y = torch.max(test_output, 1)[1].data.numpy().squeeze()
print(f'Prediction number: {pred_y}')
print(f'Actual number: {actual_number}')
Output:
Prediction number: [3 7 3 2 4 0 3 9 6 0]
Actual number: [3 7 3 2 4 0 3 9 6 0]