Building a Feedforward Neural Network using Pytorch NN Module

Niranjan Kumar
7 min readJun 30, 2019

--

Photo by Clint Adair on Unsplash

Feedforward neural networks are also known as Multi-layered Network of Neurons (MLN). These network of models are called feedforward because the information only travels forward in the neural network, through the input nodes then through the hidden layers (single or many layers) and finally through the output nodes.

Traditional models such as McCulloch Pitts, Perceptron and Sigmoid neuron models capacity is limited to linear functions. To handle the complex non-linear decision boundary between input and the output we are using the Multi-layered Network of Neurons.

Citation Note: The content and the structure of this article is based on the deep learning lectures from One-Fourth Labs — Padhai.

Outline

In this post, we will discuss how to build a feed-forward neural network using Pytorch. We will do this incrementally using Pytorch module. The way we do that it is, first we will generate non-linearly separable data with two classes. Then we will build our simple feedforward neural network using PyTorch tensor functionality. After that, we will use abstraction features available in Pytorch module such as Functional, Sequential, Linear and Optim to make our neural network concise, flexible and efficient. Finally, we will move our network to CUDA and see how fast it performs.

Note: This tutorial assumes you already have PyTorch installed in your local machine or know how to use Pytorch in Google Collab with CUDA support, and are familiar with the basics of tensor operations. If you are not familiar with these concepts kindly refer to my previous post linked below.

Rest of the article is structured as follows:

  • Import libraries
  • Generate non-linearly separable data
  • Feedforward network using tensors and auto-grad
  • Train our feedforward network
  • NN.Functional
  • NN.Parameter
  • NN.Linear and Optim
  • NN.Sequential
  • Moving the Network to GPU

If you want to skip the theory part and get into the code right away,

Import libraries

Before we start building our network, first we need to import the required libraries. We are importing the numpy to evaluate the matrix multiplication and dot product between two vectors, matplotlib to visualize the data and from the sklearn package, we are importing functions to generate data and evaluate the network performance. Importing torch for all things related to Pytorch.

#required libraries import numpy as np import math import #required libraries
import numpy as np
import math
import matplotlib.pyplot as plt
import matplotlib.colors
import time
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, mean_squared_error, log_loss
from tqdm import tqdm_notebook

from IPython.display import HTML
import warnings
from sklearn.preprocessing import OneHotEncoder
from sklearn.datasets import make_blobs

import torch
warnings.filterwarnings('ignore')

Generate non-linearly separable data

In this section, we will see how to randomly generate non-linearly separable data using sklearn.

#generate data using make_blobs function from sklearn.
#centers = 4 indicates different types of classes
data, labels = make_blobs(n_samples=1000, centers=4, n_features=2, random_state=0)
print(data.shape, labels.shape)

#visualize the data
plt.scatter(data[:,0], data[:,1], c=labels, cmap=my_cmap)
plt.show()

#splitting the data into train and test
X_train, X_val, Y_train, Y_val = train_test_split(data, labels, stratify=labels, random_state=0)
print(X_train.shape, X_val.shape, labels.shape)

To generate data randomly we will use make_blobs to generate blobs of points with a Gaussian distribution. I have generated 1000 data points in 2D space with four blobs centers=4 as a multi-class classification prediction problem. Each data point has two inputs and 0, 1, 2 or 3 class labels.

Once we have our data ready, I have used the train_test_split function to split the data for and validation in the ratio of 75:25.

Feedforward network using tensors and auto-grad

In this section, we will see how to build and train a simple neural network using Pytorch tensors and auto-grad. The network has six neurons in total — two in the first hidden layer and four in the output layer. For each of these neurons, pre-activation is represented by ‘ a’ and post-activation is represented by ‘ h ‘. In the network, we have a total of 18 parameters — 12 weight parameters and 6 bias terms.

We will use map function for the efficient conversion of numpy array to Pytorch tensors.

#converting the numpy array to torch tensors
X_train, Y_train, X_val, Y_val = map(torch.tensor, (X_train, Y_train, X_val, Y_val))
print(X_train.shape, Y_train.shape)

After converting the data to tensors, we need to write a function that helps us to compute the forward pass for the network.

#function for computing forward pass in the network
def model(x):
A1 = torch.matmul(x, weights1) + bias1 # (N, 2) x (2, 2)->(N, 2)
H1 = A1.sigmoid() # (N, 2)
A2 = torch.matmul(H1, weights2) + bias2 #(N, 2) x (2, 4)->(N, 4)
H2 = A2.exp()/A2.exp().sum(-1).unsqueeze(-1) #(N, 4) #softmax
return H2

We will define a function model which characterizes the forward pass. For each neuron present in the network, forward pass involves two steps:

  1. Pre-activation represented by ‘a’: It is a weighted sum of inputs plus the bias.
  2. Activation represented by ‘h’: Activation function is Sigmoid function.

Since we have multi-class output from the network, we are using Softmax activation instead of Sigmoid activation at the output layer (second layer) by using Pytorch chaining mechanism. The activation output of the final layer is the same as the predicted value of our network. The function will return this value outside. So that we can use this value to calculate the loss of the neuron.

#function to calculate loss of a function.
#y_hat -> predicted & y -> actual
def loss_fn(y_hat, y):
return -(y_hat[range(y.shape[0]), y].log()).mean()

#function to calculate accuracy of model
def accuracy(y_hat, y):
pred = torch.argmax(y_hat, dim=1)
return (pred == y).float().mean()

Next, we have our loss function. In this case, instead of the mean square error, we are using the cross-entropy loss function. By using the cross-entropy loss we can find the difference between the predicted probability distribution and actual probability distribution to compute the loss of the network.

Train our feed-forward network

We will now train our data on the feed-forward network which we created. First, we will initialize all the weights present in the network using Xavier initialization. Xavier Initialization initializes the weights in your network by drawing them from a distribution with zero mean and a specific variance (by multiplying with 1/sqrt(n)),

Since we have only two input features, we are dividing the weights by 2 and then call the model function on the training data with 10000 epochs and learning rate set to 0.2

#set the seed
torch.manual_seed(0)

#initialize the weights and biases using Xavier Initialization
weights1 = torch.randn(2, 2) / math.sqrt(2)
weights1.requires_grad_()
bias1 = torch.zeros(2, requires_grad=True)

weights2 = torch.randn(2, 4) / math.sqrt(2)
weights2.requires_grad_()
bias2 = torch.zeros(4, requires_grad=True)

#set the parameters for training the model
learning_rate = 0.2
epochs = 10000
X_train = X_train.float()
Y_train = Y_train.long()
loss_arr = []
acc_arr = []

#training the network
for epoch in range(epochs):
y_hat = model(X_train) #compute the predicted distribution
loss = loss_fn(y_hat, Y_train) #compute the loss of the network
loss.backward() #backpropagate the gradients
loss_arr.append(loss.item())
acc_arr.append(accuracy(y_hat, Y_train))

with torch.no_grad(): #update the weights and biases
weights1 -= weights1.grad * learning_rate
bias1 -= bias1.grad * learning_rate
weights2 -= weights2.grad * learning_rate
bias2 -= bias2.grad * learning_rate
weights1.grad.zero_()
bias1.grad.zero_()
weights2.grad.zero_()
bias2.grad.zero_()

Continue reading this article at source: marktechpost (no Paywall),

All the blogs that I publish either at medium or any third party websites like Marktechpost will not be kept behind a Paywall. If you like my content, please consider supporting what I do. You can find all of my blogs here.

The entire code discussed in the article is present in this GitHub repository. Feel free to fork it or download it. In my next article, we will discuss how to use matplotlib and seaborn to create awesome visualizations for Exploratory Data Analysis. It’s going to be a beginneer friendly post. So make sure you follow me on medium to get notified as soon as it drops.

Learn More

If you want to learn more about Data Science, Machine Learning. Check out the Machine Learning Basics and Advanced Machine Learning by Abhishek and Pukhraj from Starttechacademy. One of the good points about these courses is that they teach in both Python and R, so it’s your choice.

Until then Peace :)

NK.

Author Bio

Niranjan Kumar is Retail Risk Analyst Intern at HSBC Analytics division. He is passionate about Deep learning and Artificial Intelligence. He was one of the top writers at Medium in Artificial Intelligence for 2.5 Months. You can find all of Niranjan’s blog here. You can connect with Niranjan on LinkedIn, Twitter and GitHub to stay up to date with his latest blog posts.

Disclaimer — There might be some affiliate links in this post to relevant resources. You can purchase the bundle at the lowest price possible. I will receive a small commission if you purchase the course.

Originally published at https://www.marktechpost.com on June 30, 2019.

--

--

Niranjan Kumar

Senior Consultant Data Science|| Freelancer. Writer @ TDataScience & Hackernoon|| connect & fork @ Niranjankumar-c