# Learn how to Build Neural Networks from Scratch in Python for Digit Recognition

Oct 12, 2018 · 7 min read

# What we’ll cover in this post

So in this blog post, we will learn how a neural network can be used for the same task. As neural networks can fit more complex non-linear boundaries, we should see an increase in our classifier accuracy too.

# 1. Feedforward Propagation

We first implement feedforward propagation for neural network with the already given weights. Then we will implement the backpropagation algorithm to learn the parameters for ourselves. Here we use the term weights and parameters interchangeably.

# 1.1 Visualizing the data:

Each training example is a 20 pixel by 20 pixel grayscale image of the digit. Each pixel is represented by a floating point number indicating the grayscale intensity at that location. The 20 by 20 grid of pixels is “unrolled” into a 400-dimensional vector. Each of these training examples becomes a single row in our data matrix `X`. This gives us a 5000 by 400 matrix `X` where every row is a training example for a handwritten digit image. The second part of the training set is a 5000-dimensional vector `y` that contains labels for the training set.

`from scipy.io import loadmatimport numpy as npimport scipy.optimize as optimport pandas as pdimport matplotlib.pyplot as plt# reading the datadata = loadmat('ex4data1.mat')X = data['X']y = data['y']# visualizing the data_, axarr = plt.subplots(10,10,figsize=(10,10))for i in range(10):    for j in range(10):       axarr[i,j].imshow(X[np.random.randint(X.shape[0])].\reshape((20,20), order = 'F'))                 axarr[i,j].axis('off')`

# 1.2 Model Representation

`weights = loadmat('ex4weights.mat')theta1 = weights['Theta1']    #Theta1 has size 25 x 401theta2 = weights['Theta2']    #Theta2 has size 10 x 26nn_params = np.hstack((theta1.ravel(order='F'), theta2.ravel(order='F')))    #unroll parameters# neural network hyperparametersinput_layer_size = 400hidden_layer_size = 25num_labels = 10lmbda = 1`

# 1.3 Feedforward and cost function

First we will implement the cost function followed by gradient for the neural network (for which we use backpropagation algorithm). Recall that the cost function for the neural network with regularization is

`def sigmoid(z):    return 1/(1+np.exp(-z))`

## cost function

`nnCostFunc(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, lmbda)`

# 2 Backpropagation

In this part of the exercise, you will implement the backpropagation algorithm to compute the gradients for the neural network. Once you have computed the gradient, you will be able to train the neural network by minimizing the cost function using an advanced optimizer such as `fmincg`.

We will first implement the sigmoid gradient function. The gradient for the sigmoid function can be computed as

`def sigmoidGrad(z):    return np.multiply(sigmoid(z), 1-sigmoid(z))`

## 2.2 Random initialization

When training neural networks, it is important to randomly initialize the parameters for symmetry breaking. Here we randomly initialize parameters named `initial_theta1` and `initial_theta2` corresponding to hidden layer and output layer and unroll into a single vector as we did earlier.

## 2.3 Backpropagation

Backpropagation is not so complicated algorithm once you get the hang of it.
I strongly urge you to watch the Andrew’s videos on backprop multiple times.

`nn_backprop_Params = nnGrad(nn_initial_params, input_layer_size, hidden_layer_size, num_labels, X, y, lmbda)`

Why do we need Gradient checking ? To make sure that our backprop algorithm has no bugs in it and works as intended. We can approximate the derivative of our cost function with:

`checkGradient(nn_initial_params,nn_backprop_Params,input_layer_size, hidden_layer_size, num_labels,X,y,lmbda)`

## 2.5 Learning parameters using `fmincg`

After you have successfully implemented the neural network cost function and gradient computation, the next step is to use `fmincg `to learn a good set of parameters for the neural network. `theta_opt` contains unrolled parameters that we just learnt which we roll to get `theta1_opt` and `theta2_opt`.

## 2.6 Prediction using learned parameters

It’s time to see how well our newly learned parameters are performing by calculating the accuracy of the model. Do recall that when we used linear classifier like Logistic Regression we got an accuracy of 95.08%. Neural network should give us a better accuracy.

`pred = predict(theta1_opt, theta2_opt, X, y)np.mean(pred == y.flatten()) * 100`

# End Notes

We just saw how neural networks can be used to perform complex tasks like digit recognition, and in the process also got to know about backpropagation algorithm.

Written by