Pytorch for Beginners 💫

Part 2 : Neural Network basics and its implementation from scratch

7 min readFeb 2, 2022

This article will show you how to create a simple neural network from scratch using Pytorch. If you don’t have any idea about how to perform some basic operations in Pytorch, I would like you to check out the first part of this series.

Pytorch for Beginners💫

Part I: Basic Operations in Tensors

medium.com

Before starting Implementing Neural Networks from scratch you should have knowledge about Auto grad.

What is Auto grad?

torch.autograd is PyTorch’s automatic differentiation engine that powers neural network training

Training a Neural network (NN) happens in two steps:

Forward Propagation: In forward prop, the NN makes its best guess about the correct output. It runs the input data through each of its functions to make this guess.

Backward Propagation: In backpropagation, the NN adjusts its parameters proportionate to the error in its guess. It does this by traversing backward from the output, collecting the derivatives of the error with respect to the parameters of the functions (gradients), and optimizing the parameters using gradient descent.

import torch
from torch.autograd import grad
import torch.nn.functional as F

Creating two tensors with values 5 and 6 respectively. We will give one additional parameter requires_grad while creating a tensor. This parameter tells Pytorch to Track and creates a differentiation map of the variable. In simple words, If you want to perform differentiation you have to pass this parameter while creating your tensors.

a = torch.tensor([5.0] , requires_grad = True)
b = torch.tensor([6.0] , requires_grad = True)print(a , b)

y = a**3 - b**2
y

To find out the differentiation you have to call the variable_name.grad method. But at first, the output will be None. To calculate the differentiation you have to call variable_name.backward() method.

print(a.grad) , print(b.grad)

y.backward()

a.grad , b.grad

We used y.backward here because we want to calculate the differentiation of y with respect to a and b.

Now, you are familiar with Auto-grad but to make things more clear I want to give you one more example:

x = torch.tensor([3. ])
w = torch.tensor([2. ] , requires_grad= True)
b= torch.tensor([1. ] , requires_grad = True)

Here, we have created an x tensor which we will give as input. w is for weight and b is for bias and we want to track w and b because we want to calculate the gradient using these two variable.

import torch.nn.functional as F
out = F.relu(w*x + b)
out

If you don’t know what is relu. Don’t worry we will see what is relu in the future But for now, relu stands for Rectified linear activation function or ReLU for short is a piecewise linear function that will output the input directly if it is positive, otherwise, it will output zero.

In simple language, it will give a positive value if the value is greater than zero or give output 0 if the value is less than zero.

grad(out , w , retain_graph = True)

This is also a way to calculate the gradient. This simply means calculating the gradient of out with respect to w.

grad(out , b )

I hope now Auto grad is clear to you, if not feel free to ask queries in the comment section.

Now you are all set!

Let's get started with Implementing Neural networks from scratch. I will try to explain to you all of these concepts as if you are a complete beginner but if you already know these concepts, directly jump to the coding part.

fig 1 shows you what’s actually happening in the neural network’s single neuron. Here we have multiple inputs x1, x2 …. xn, then these inputs are associated with some weights known as synaptic weights w1, w2 … wn respectively. Then these values are passed to adder Function (σ) where all of these values are multiplied and then bias is added to the final term.

So if we represent all of these operations in vectors then it will be written as (weight(transpose) x inputs + bias )

To find values of bias and weights we will first take some random values and then with each iteration, we will find the new value by subtracting the current bias or weight with (learning rate) x (gradient of the loss with respect to w and b). Then these values are passed to some activation function and then we get the output.

we will use torch.manual_seed for reproducing the same output in each run

Here we just created classification data with 1000 samples, 4 features, and 2 classes in the target using sklearn.make_classification method and converted X and y from simple arrays to tensors

Look, now we will shuffle and split data into training and testing data. You can easily perform this step using:

sklearn.model_selection.train_test_train_split() :

sklearn.model_selection.train_test_split

Examples using sklearn.model_selection.train_test_split: Release Highlights for scikit-learn 0.23 Release Highlights…

scikit-learn.org

But I thought when are doing everything from scratch so why not this!

Now we will change the values of numeric columns in the data-set to a common scale, without distorting differences in the ranges of values which is commonly known as Normalization

Now again you can use sklearn Standard Scaler to scale your values but we will write the code for this FROM SCRATCH.

Now, we are going to implement a class NN in which we perform a forward pass in which we will multiply input and weights, and then we will add bias (fig 1). We are implementing everything from scratch but for backward pass, we will take the help of autograd because implementing Auto-grad from scratch will probably give me a headache so we will simply use inbuilt Auto-grad 🙂.

Here, we created a simple class and in the constructor, we are giving a number of inputs as an argument, and instead of initializing bias and weight with some random numbers, we initialized them with zeros.

Note : Loss function is defined outside the class

For calculating loss we used Mean Squared Error.

If you don’t know what loss is then the loss is just a way to calculate the error between predicted and actual output. In simple terms, we have to decrease the value of this loss in order to get better accuracy but don’t worry Neural Network (NN) will automatically do this job for you by adjusting the weight and bias.

Now It’s time to define a Model ⌚

Chill!

I will try to explain everything Line by Line. Here, we defined a functioning train that takes model, X(inputs), Y(target), epochs (epochs means how many times we want to show the whole data set to the model ), lr is for learning rate, the seed for manual seed and bsz is for Batch size( batches are just small chunks of the whole datasets and here 50 means we want to divide the whole datasets into small chunks of size 50 ).

Look 1st for loop is for epochs (epochs means how many times we want to show the whole data set to the model ) and under that loop, for different shuffled indexes we divide the whole datasets into batches and for individual batches, we are performing a forward pass and calculating a Loss. Then we are calculating the derivation of loss with respect to bias and weights. We need this derivation for updating weights and bias.

Using this formula, we are then updating weights and bias. Then we are calculating yhat( Y predicted ) and print the current loss for the particular epoch.

Let’s train our NN model

Let’s plot our graph

For checking accuracy

Now, If you remember I have told you that after adding weight and input to bias we send that output to some activation function but here I haven’t implemented any. But that’s a topic for another article. But for now, just think here in this case activation function was not required.

If you have doubts or queries just Ask!

Like, Share, and Follow ❤