Learn how to Build Neural Networks from Scratch in Python for Digit Recognition

Srikar
Srikar
Oct 12, 2018 · 7 min read

What we’ll cover in this post

So in this blog post, we will learn how a neural network can be used for the same task. As neural networks can fit more complex non-linear boundaries, we should see an increase in our classifier accuracy too.

1. Feedforward Propagation

We first implement feedforward propagation for neural network with the already given weights. Then we will implement the backpropagation algorithm to learn the parameters for ourselves. Here we use the term weights and parameters interchangeably.

1.1 Visualizing the data:

Each training example is a 20 pixel by 20 pixel grayscale image of the digit. Each pixel is represented by a floating point number indicating the grayscale intensity at that location. The 20 by 20 grid of pixels is “unrolled” into a 400-dimensional vector. Each of these training examples becomes a single row in our data matrix X. This gives us a 5000 by 400 matrix X where every row is a training example for a handwritten digit image. The second part of the training set is a 5000-dimensional vector y that contains labels for the training set.

from scipy.io import loadmat
import numpy as np
import scipy.optimize as opt
import pandas as pd
import matplotlib.pyplot as plt
# reading the data
data = loadmat('ex4data1.mat')
X = data['X']
y = data['y']
# visualizing the data
_, axarr = plt.subplots(10,10,figsize=(10,10))
for i in range(10):
for j in range(10):
axarr[i,j].imshow(X[np.random.randint(X.shape[0])].\
reshape((20,20), order = 'F'))
axarr[i,j].axis('off')

1.2 Model Representation

weights = loadmat('ex4weights.mat')
theta1 = weights['Theta1'] #Theta1 has size 25 x 401
theta2 = weights['Theta2'] #Theta2 has size 10 x 26
nn_params = np.hstack((theta1.ravel(order='F'), theta2.ravel(order='F'))) #unroll parameters# neural network hyperparameters
input_layer_size = 400
hidden_layer_size = 25
num_labels = 10
lmbda = 1

1.3 Feedforward and cost function

First we will implement the cost function followed by gradient for the neural network (for which we use backpropagation algorithm). Recall that the cost function for the neural network with regularization is

cost function of neural network with regularization
one-hot encoding
def sigmoid(z):
return 1/(1+np.exp(-z))

cost function

nnCostFunc(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, lmbda)

2 Backpropagation

In this part of the exercise, you will implement the backpropagation algorithm to compute the gradients for the neural network. Once you have computed the gradient, you will be able to train the neural network by minimizing the cost function using an advanced optimizer such as fmincg.

2.1 Sigmoid gradient

We will first implement the sigmoid gradient function. The gradient for the sigmoid function can be computed as

def sigmoidGrad(z):
return np.multiply(sigmoid(z), 1-sigmoid(z))

2.2 Random initialization

When training neural networks, it is important to randomly initialize the parameters for symmetry breaking. Here we randomly initialize parameters named initial_theta1 and initial_theta2 corresponding to hidden layer and output layer and unroll into a single vector as we did earlier.

2.3 Backpropagation

Backpropagation is not so complicated algorithm once you get the hang of it.
I strongly urge you to watch the Andrew’s videos on backprop multiple times.

nn_backprop_Params = nnGrad(nn_initial_params, input_layer_size, hidden_layer_size, num_labels, X, y, lmbda)

2.4 Gradient checking

Why do we need Gradient checking ? To make sure that our backprop algorithm has no bugs in it and works as intended. We can approximate the derivative of our cost function with:

checkGradient(nn_initial_params,nn_backprop_Params,input_layer_size, hidden_layer_size, num_labels,X,y,lmbda)
outputs of gradient check

2.5 Learning parameters using fmincg

After you have successfully implemented the neural network cost function and gradient computation, the next step is to use fmincg to learn a good set of parameters for the neural network. theta_opt contains unrolled parameters that we just learnt which we roll to get theta1_opt and theta2_opt.

2.6 Prediction using learned parameters

It’s time to see how well our newly learned parameters are performing by calculating the accuracy of the model. Do recall that when we used linear classifier like Logistic Regression we got an accuracy of 95.08%. Neural network should give us a better accuracy.

pred = predict(theta1_opt, theta2_opt, X, y)
np.mean(pred == y.flatten()) * 100

End Notes

We just saw how neural networks can be used to perform complex tasks like digit recognition, and in the process also got to know about backpropagation algorithm.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Srikar

Written by

Srikar

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com