# Building a Feedforward Neural Network using Pytorch NN Module

Feedforward neural networks are also known as **Multi-layered Network of Neurons** (MLN). These network of models are called feedforward because the information only travels forward in the neural network, through the input nodes then through the hidden layers (single or many layers) and finally through the output nodes.

Traditional models such as McCulloch Pitts, Perceptron and Sigmoid neuron models capacity is limited to linear functions. To handle the complex non-linear decision boundary between input and the output we are using the Multi-layered Network of Neurons.

Citation Note: The content and the structure of this article is based on the deep learning lectures from One-Fourth Labs — Padhai.

# Outline

In this post, we will discuss how to build a feed-forward neural network using Pytorch. We will do this incrementally using Pytorch module. The way we do that it is, first we will generate non-linearly separable data with two classes. Then we will build our simple feedforward neural network using PyTorch tensor functionality. After that, we will use abstraction features available in Pytorch module such as Functional, Sequential, Linear and Optim to make our neural network concise, flexible and efficient. Finally, we will move our network to CUDA and see how fast it performs.

**Note: This tutorial assumes you already have PyTorch installed in your local machine or know how to use Pytorch in Google Collab with CUDA support, and are familiar with the basics of tensor operations. **If you are not familiar with these concepts kindly refer to my previous post linked below.

**Rest of the article is structured as follows:**

- Import libraries
- Generate non-linearly separable data
- Feedforward network using tensors and auto-grad
- Train our feedforward network
- NN.Functional
- NN.Parameter
- NN.Linear and Optim
- NN.Sequential
- Moving the Network to GPU

If you want to skip the theory part and get into the code right away,

# Import libraries

Before we start building our network, first we need to import the required libraries. We are importing the `numpy`

to evaluate the matrix multiplication and dot product between two vectors, `matplotlib`

to visualize the data and from the `sklearn`

package, we are importing functions to generate data and evaluate the network performance. Importing `torch`

for all things related to Pytorch.

`#required libraries import numpy as np import math import #required libraries`

import numpy as np

import math

import matplotlib.pyplot as plt

import matplotlib.colors

import time

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score, mean_squared_error, log_loss

from tqdm import tqdm_notebook

from IPython.display import HTML

import warnings

from sklearn.preprocessing import OneHotEncoder

from sklearn.datasets import make_blobs

import torch

warnings.filterwarnings('ignore')

# Generate non-linearly separable data

In this section, we will see how to randomly generate non-linearly separable data using `sklearn`

.

`#generate data using make_blobs function from sklearn.`

#centers = 4 indicates different types of classes

data, labels = make_blobs(n_samples=1000, centers=4, n_features=2, random_state=0)

print(data.shape, labels.shape)

#visualize the data

plt.scatter(data[:,0], data[:,1], c=labels, cmap=my_cmap)

plt.show()

#splitting the data into train and test

X_train, X_val, Y_train, Y_val = train_test_split(data, labels, stratify=labels, random_state=0)

print(X_train.shape, X_val.shape, labels.shape)

To generate data randomly we will use `make_blobs`

to generate blobs of points with a Gaussian distribution. I have generated 1000 data points in 2D space with four blobs `centers=4`

as a multi-class classification prediction problem. Each data point has two inputs and 0, 1, 2 or 3 class labels.

Once we have our data ready, I have used the `train_test_split`

function to split the data for and `validation`

in the ratio of 75:25.

# Feedforward network using tensors and auto-grad

In this section, we will see how to build and train a simple neural network using Pytorch tensors and auto-grad. The network has six neurons in total — two in the first hidden layer and four in the output layer. For each of these neurons, pre-activation is represented by ‘ **a**’ and post-activation is represented by ‘ **h** ‘. In the network, we have a total of 18 parameters — 12 weight parameters and 6 bias terms.

We will use `map`

function for the efficient conversion of numpy array to Pytorch `tensors`

.

#converting the numpy array to torch tensors

X_train, Y_train, X_val, Y_val = map(torch.tensor, (X_train, Y_train, X_val, Y_val))print(X_train.shape, Y_train.shape)

After converting the data to tensors, we need to write a function that helps us to compute the forward pass for the network.

`#function for computing forward pass in the network`

def model(x):

A1 = torch.matmul(x, weights1) + bias1 # (N, 2) x (2, 2)->(N, 2)

H1 = A1.sigmoid() # (N, 2)

A2 = torch.matmul(H1, weights2) + bias2 #(N, 2) x (2, 4)->(N, 4)

H2 = A2.exp()/A2.exp().sum(-1).unsqueeze(-1) #(N, 4) #softmax

return H2

We will define a function `model`

which characterizes the forward pass. For each neuron present in the network, forward pass involves two steps:

- Pre-activation represented by ‘a’: It is a weighted sum of inputs plus the bias.
- Activation represented by ‘h’: Activation function is Sigmoid function.

Since we have multi-class output from the network, we are using Softmax activation instead of Sigmoid activation at the output layer (second layer) by using Pytorch chaining mechanism. The activation output of the final layer is the same as the predicted value of our network. The function will return this value outside. So that we can use this value to calculate the loss of the neuron.

`#function to calculate loss of a function.`

#y_hat -> predicted & y -> actual

def loss_fn(y_hat, y):

return -(y_hat[range(y.shape[0]), y].log()).mean()

#function to calculate accuracy of model

def accuracy(y_hat, y):

pred = torch.argmax(y_hat, dim=1)

return (pred == y).float().mean()

Next, we have our loss function. In this case, instead of the mean square error, we are using the cross-entropy loss function. By using the cross-entropy loss we can find the difference between the predicted probability distribution and actual probability distribution to compute the loss of the network.

# Train our feed-forward network

We will now train our data on the feed-forward network which we created. First, we will initialize all the weights present in the network using Xavier initialization. Xavier Initialization initializes the weights in your network by drawing them from a distribution with zero mean and a specific variance (by multiplying with 1/sqrt(n)),

Since we have only two input features, we are dividing the weights by 2 and then call the `model`

function on the training data with 10000 epochs and learning rate set to 0.2

`#set the seed`

torch.manual_seed(0)

#initialize the weights and biases using Xavier Initialization

weights1 = torch.randn(2, 2) / math.sqrt(2)

weights1.requires_grad_()

bias1 = torch.zeros(2, requires_grad=True)

weights2 = torch.randn(2, 4) / math.sqrt(2)

weights2.requires_grad_()

bias2 = torch.zeros(4, requires_grad=True)

#set the parameters for training the model

learning_rate = 0.2

epochs = 10000

X_train = X_train.float()

Y_train = Y_train.long()

loss_arr = []

acc_arr = []

#training the network

for epoch in range(epochs):

y_hat = model(X_train) #compute the predicted distribution

loss = loss_fn(y_hat, Y_train) #compute the loss of the network

loss.backward() #backpropagate the gradients

loss_arr.append(loss.item())

acc_arr.append(accuracy(y_hat, Y_train))

with torch.no_grad(): #update the weights and biases

weights1 -= weights1.grad * learning_rate

bias1 -= bias1.grad * learning_rate

weights2 -= weights2.grad * learning_rate

bias2 -= bias2.grad * learning_rate

weights1.grad.zero_()

bias1.grad.zero_()

weights2.grad.zero_()

bias2.grad.zero_()

**Continue reading this article at source: marktechpost (no Paywall),**

All the blogs that I publish either at medium or any third party websites like Marktechpost will not be kept behind a Paywall. If you like my content, please consider supporting what I do. You can find all of my blogs here.

The entire code discussed in the article is present in this GitHub repository. Feel free to fork it or download it. In my next article, we will discuss how to use matplotlib and seaborn to create awesome visualizations for Exploratory Data Analysis. It’s going to be a beginneer friendly post. **So make sure you follow me on ****medium**** to get notified as soon as it drops**.

Until then Peace :)

NK.

**Author Bio**

Niranjan Kumar is Retail Risk Analyst Intern at HSBC Analytics division. He is passionate about Deep learning and Artificial Intelligence. He was one of the top writers at Medium in Artificial Intelligence for 2.5 Months. You can find all of Niranjan’s blog here. You can connect with Niranjan on LinkedIn, Twitter and GitHub to stay up to date with his latest blog posts.

**I am looking for opportunities either full-time or freelance projects, in the field of Machine Learning and Deep Learning. If there are any relevant opportunities, feel free to drop me a message on ****LinkedIn**** or you can reach me through ****email**** as well. I would love to discuss**.

*Originally published at **https://www.marktechpost.com** on June 30, 2019.*