Pytorch — Step by Step approach for building a Basic Neural Network

Published in

The Startup

5 min readMar 16, 2020

Building a basic Neural Network with Pytorch

Pytorch is fast becoming the choice of framework for deep learning applications. In this notebook, I am going to demonstrate how we can build Neural Networks using Pytorch APIs.

About the MNIST Dataset

Here we are building a basic Neural Network for performing digit classification on Mnist dataset. This dataset contains handwritten digits from 0–9 and training set has 50,000 observations of data.

Following is a step by step approach for building a neural network.

Please note that MNIST_URL available above which will be used for downloading data for this exercise.

Downloading and Extracting Data

Data is also arranged in training and validation sets.

Checking the shape of Data

The training data has 50,000 samples, and it contains digits from 0 to 9. Image size contains 784 pixels or (28 X 28).

Looking at one of the image and corresponding label

Building the Architecture

Pytorch has various utility functions and classes for building Neural Network. There is basic Linear function which is available as nn.Linear. This takes three parameters, which are number of inputs, number of outputs and whether to include the bias ((one optional), which defaults to True. nn.Linear is for Linear Transformation. This does the matrix multiplication of weights and inputs and adds a bias term.

Here, in this example, we have two linear transformations and in between, we are doing a reLu transformation to bring in Non Linearity. A ReLU transformation changes the negatives to zeroes and it does not do anything to positive values.

We will also use other Pytorch utilities nn.functional and optim

We are building a simple network here, input to which would be 784 (28 X 28) units, there will be a hidden layer consisting of 84 units and a final layer which consists of 10 units which is equal to number of output classes.

Its time to build the Neural Network model which will have the following architecture:

- Linear Layer with input of 784, output of 84
- Relu activation
- Linear Layer with input of 84, output of 10 classes.

Note: The hidden layer has got 84 units.

Building the model

Few things to talk about here:

Model : A model is a neural network (normally deep) which is trained with help of data. This contains a number of nodes (also called perceptrons) who are associated with weights. These weights are the ones which are tweaked during the training of Neural Network with an objective to find best weights. This is achieved by doing a operation called Back Propagation
Optimizer (opt) : An optimizer is the one which makes the training of a neural network in most efficient way. This is achieved with the help of a Loss function.
Loss Function (loss_func) — A loss function gives an idea that how bad is our Neural Network. Loss function acts as a guide for the Optimizer.
Epochs — This is the number of times a model sees all observations given for training purpose.
Batch Size: This is important to understand. When doing deep learning, it is not feasible to process all observations at the same time. This is because of two reasons — The training dataset is too large and model also contains huge number of parameters. To overcome this, training a neural network is done in batches.
Learning rate (lr): Learning rate is the parameter which defines how quickly (or slowly) a model is trained. This is a very important parameter for training Neural Networks.

Model Training

Following is the place where training of model takes place. You will notice two for loops here, one for epochs and other is for number of iterations which is required per epoch to take care of batch size. Please note that Back Propagation happens at the call loss.backward() and then afterwards opt.step call updates the weights of the network.

Checking Accuracy

Checking the accuracy of the model trained just now.

We get a fairly high accuracy for a simple model architecture. We should also notice that with only one hidden layer, the accuracy is good. Modern Neural Network architectures typically contain a lot of hidden layers.

Let us experiment a bit

Let us try to see how the network performs if we add and extra layer of 28 units and change the learning rate from .01 to .03.

Thats it. We have succesfully built and trained a neural network using Pytorch’s nn , nn.functional and optim modules and experimented a bit as well. Hope you liked this article. Please feel free to send any comment which can be useful. If you have a need to code, entire notebook is available here.