Beginners Guide to Building Neural Networks using PyTorch

Published in

FSE.ai

7 min readFeb 3, 2020

This blog helps beginners to get started with PyTorch, by giving a brief introduction to tensors, basic torch operations, and building a neural network model from scratch. Let’s dive right into it!

What is PyTorch and Why PyTorch?

PyTorch is an open-source deep learning framework for python, primarily developed by Facebook’s AI research lab. In simple terms, PyTorch is a library for processing tensors. So, what are tensors? Tensors are multidimensional arrays that contain your data. So you would be familiar with numpy in python, it’s the same. Numpy calls its tensors as ‘arrays’, while PyTorch named them as ‘tensors’. But, most importantly, PyTorch has gained its popularity as an alternative of numpy for faster processing by GPU’s. Since deep learning computations are all about matrix multiplications and convolutions, GPU’s are preferred here as they can perform these computations faster than a CPU.

Installation of PyTorch

The PyTorch official website https://pytorch.org/ provides installation commands for various system requirements. Select your preferences and run the install command. For example, if you are using anaconda, you can use the command for windows with a CUDA of 10.1:conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

# Import PyTorch with the commandimport torch

Tensors in PyTorch

Tensors can be a number, a vector, a matrix, or an n-dimensional array. So let’s get started by creating some tensors. You can use the torch.tensor () to create tensors of any dimension. Note that all the rows must have the same length to form a tensor. You can find the attributes of the tensor using commands like .shape and .dtype.

# Creating tensors of different dimensions:
# tensor with single element
a = torch.tensor([1])# Vector tensor: 1 dimensional(integer values)
b = torch.tensor([1, 2, 3, 4, 5])# Matrix tensor: 2 dimensional (floating values)
# Note: Here one one element is written in float, but PyTorch #converts the rest for you (1. = 1.0)
c = torch.tensor([[1., 2, 3],[6, 7, 8]])# 3 dimensional tensor (integer values)
d = torch.tensor([[[1, 2, 3, 4],[11, 12, 13, 14]],[[1, 2, 3, 4],[6, 7, 8, 9]]])####print('Single element tensor, a \n %s \n' % a)
print('Vector tensor, b \n %s \n' % b)
print('Matrix tensor, c \n %s \n' % c)
print('3 dimensional tensor, d \n %s \n' % d)
print('Size of tensor c = %s \n Datatype = %s \n' % (c.shape, c.dtype))
print('Size of tensor d = %s \n Datatype = %s \n' % (d.shape, d.dtype))

Results:

Single element tensor, a 
 tensor([1]) Vector tensor, b 
 tensor([1, 2, 3, 4, 5]) Matrix tensor, c 
 tensor([[1., 2., 3.],
        [6., 7., 8.]]) 3 dimensional tensor, d 
 tensor([[[ 1,  2,  3,  4],
         [11, 12, 13, 14]],        [[ 1,  2,  3,  4],
         [ 6,  7,  8,  9]]])Size of tensor c = torch.Size([2, 3]) 
 Datatype = torch.float32 Size of tensor d = torch.Size([2, 2, 4]) 
 Datatype = torch.int64

Now let’s look at some commonly used methods of creating tensors. Similar to NumPy methods, you can create empty tensors, and elements of all 1’s, 0’s and random numbers.

The table gives a quick look at some of the commonly used tensor operations.

Defining our DataSet

In this article, we will design a neural network for recognizing handwritten digits, using the MNIST dataset. So let's try to understand our dataset first. The MNIST dataset is a collection of 70,000 images of handwritten digits ranging from 0 to 9. The dataset is split into 60,000 training and 10,000 testing images, with each image of size 28*28.

This image is a sample from our dataset, and I’m sure you already predicted its an 8. But you dint even notice how quickly your brain made that decision. Now we need to design a machine to mimic the same brain activity and thankfully we have neural networks to do them. So let's jump in!

The dataset is downloaded from torchvision. Using torchvision.transforms, we can also define the type of transformation we want to apply on the dataset. Here we will use ToTensor() to transforms the image pixels of range [0, 255] to a tensor within the range [0,1], in order to normalize the pixel values which a commonly used preprocessing method. After downloading the dataset, we use the DataLoader to create mini-batches of our dataset to train our model. We have set the batch size to 64, and turned on the shuffler!

import torch
from torchvision import transforms, datasets
trainset = datasets.MNIST('', download=True, train=True, transform=transforms.ToTensor())
testset = datasets.MNIST('', download=True, train=False, transform=transforms.ToTensor())from torch.utils.data import DataLoader
train_loader = DataLoader(trainset, batch_size=64, shuffle=True)
test_loader = DataLoader(testset, batch_size=64, shuffle=True)

So whats next? This is what we’ll be doing:

Define our neural network
Define our loss function and optimizer
Feed-forward the network
Compute the loss(cost/error)
Backpropogate the gradient of the loss
Update the parameters(weights and bias)

Building the Neural Network

Now we get to the fun part, building our neural network. We will build a network with two hidden layers: input layers layer of 784 neurons, hidden layers of 128 and 64, and finally output layer of 10 neurons (to predict 0–9 digits). So how do we feed out images as inputs to the network? We flatten our images of size [28,28] into tensors of size 784 (28*28) before feeding them into our network.

Reshaping Images of size [28,28] into tensors [784,1]

Building a network in PyTorch is so simple using the torch.nn module. It provides us with a higher-level API to build and train networks. To define the model, we need to define two functions in the module: __init__()and forward(). In the __int__ function we configure all our layers and parameters, and then we define the forward function to compute the output by applying the layers and corresponding activation functions. That's it! We have created our neural network model!

input_size = 784
hidden_size = [128, 64]
output_size = 10import torch.nn as nn
class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(NeuralNet, self).__init__()

        # Inputs to hidden layer linear transformation
        self.layer1 = nn.Linear(input_size, hidden_size[0])
        # Hidden layer 1 to HL2 linear transformation
        self.layer2 = nn.Linear(hidden_size[0], hidden_size[1])
        # HL2 to output linear transformation
        self.layer3 = nn.Linear(hidden_size[1], output_size)

        # Define relu activation and LogSoftmax output
        self.relu = nn.ReLU()
        self.LogSoftmax = nn.LogSoftmax(dim=1)

    def forward(self, x):
        # HL1 with relu activation
        out = self.relu(self.layer1(x))
        # HL2 with relu activation
        out = self.relu(self.layer2(out))
        # Output layer with LogSoftmax activation
        out = self.LogSoftmax(self.layer3(out))
        return out

Let's run our model.

model = NeuralNet(input_size, hidden_size, output_size)

Note that we have used a ReLu activation function for the first two layers. In the simplest terms, it's a piecewise linear function that outputs the input directly, if it is positive and otherwise, it is set to zero. In the output layer, we have used the LogSoftmax as the activation function. It is just a log on top of the softmax layer, which is typically the final output layer in a neural network that performs multi-class classification.

Moving on, we need to define our loss function and optimizer. We will use a negative-log-likelihood loss to find how far our predictions are from the target values. The optimizer we have chosen is the Stochastic Gradient Descent(SGD).

from torch import optim

lossFunction = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.003, momentum=0.9)

Now we need to train our model over the 60,000 images in the dataset. The number of times the model iterates over the entire dataset is decided by num_epochs. We calculate the loss in each mini-batch and accumulate them to ‘loss_’.

num_epochs = 10
for epoch in range(num_epochs):
    loss_ = 0
    for images, labels in train_loader:
        # Flatten the input images of [28,28] to [1,784]
        images = images.reshape(-1, 784)

        # Forward Pass
        output = model(images)
        # Loss at each oteration by comparing to target(label)
        loss = lossFunction(output, labels)

        # Backpropogating gradient of loss
        optimizer.zero_grad()
        loss.backward()

        # Updating parameters(weights and bias)
        optimizer.step()

        loss_ += loss.item()
    print("Epoch{}, Training loss:{}".format(epoch, loss_ / len(train_loader)))

With each epoch, your training loss reduces and your model gets optimized. Result:

Epoch 0, Training loss: 0.07328846121724443
Epoch 1, Training loss: 0.06914612076239912
Epoch 2, Training loss: 0.06573875527534741
Epoch 3, Training loss: 0.06282463806762825
Epoch 4, Training loss: 0.05949374682531197

We have now successfully trained our network. Next, we need to test the performance of our model on the remaining 10,000 testing data set.

with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.reshape(-1, 784)
        out = model(images)
        _, predicted = torch.max(out, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
    print('Testing accuracy: {} %'.format(100 * correct / total))

Result:

Testing accuracy: 97.42 %

Our model got an accuracy of 97.4% 🙌 🙌 🙌 So let’s go ahead and save our trained model so that we don't have to re-train the model, the next time we use it for a prediction.

torch.save(model, 'mnist_model.pt')

With that, we have come to the end of the article. You read this far, You Go! I am excited to know how it worked for you. The entire notebook is available here. And, Don’t forget to give your 👏 ! More cool articles lined up. Coming soon!