# Digit Classifier using Neural Networks

Hey all, In this post, I’ll show you how to build a beginner-friendly framework for building neural networks in Python. The primary objective of this code is to help novices learn the fundamentals of neural networks. And we will recognize hand-written digits using Neural Networks. The neural networks will be able to represent complex models that form non-linear hypothesis. If this doesn’t make sense to you, don’t worry this post will help understand. If you’re unfamiliar with neural networks, read my earlier post to learn the fundamental ideas of neural networks.(** Click here** to navigate to my previous post.)

## Model Representation

Our neural network is shown above. It has 3 layers → an input layer, a hidden layer and an output layer. Since we are working with pictures, our neural network cannot accept an image as input; instead, we must provide pixels from an image as input(**Note:** Images are made up of pixels). To ensure that all of the pictures are the same size, we will scale them to 20x20 pixels. By unrolling them into 1D array, it gives us 400D vector which will act as input layer units for our neural network(excluding the extra bias unit which will always outputs +1).

Let’s import the require modules and load our dataset,

`import numpy as np`

import pandas as pd

import matplotlib.pyplot as plt

from scipy.io import loadmat

import matplotlib.image as img

`mat = loadmat('ex4data1.mat')`

X = mat['X']

y = mat['y']

X.shape, y.shape

Let’s visualize our dataset by the following command,

`fig, axis = plt.subplots(10, 10, figsize=(8, 8))`

for i in range(10):

for j in range(10):

axis[i, j].imshow(

X[np.random.randint(0, 5001), :].reshape(20, 20, order='F'), cmap='gray')

axis[i, j].axis('off')

## Sigmoid

We have spoke more about this in our earlier post. So, i will just skip the explanation. Basically, sigmoid is an activation function that takes a real-valued input and squashes it to range between 0 and 1.

`def sigmoid(z):`

return 1/(1+np.exp(-z))

## Forward Propagation and Cost Function

The above picture shows forward propagation of one layer in neural network. The formula for forward propagation is as follows:

We set x(input) as a¹, then we multiply a¹ with θ¹ (i.e., weights w¹ as depicted in the above picture) and add bias (i.e., b or θ₀¹) at the end we will send the dot product of a¹ and θ¹ into an activation function in our case sigmoid function. This is repeated for all 400 values in the input layer and all the values in the hidden layer. To find the good parameters, below cost function is used:

Here the cost function looks similar to Logistic Regression’s cost function but with extra regularization term which helps to improve accuracy of our algorithm. This cost function helps us to learn good parameters.

## Backpropagation

Backpropagation is the technique used to change the *weights* and *biases*, so that the neural network’s output gets more accurate. We moved from left to right in forward propagation, but we move from right to left in backward propagation. Let us consider simple neural network:

Backward propagation is just taking derivatives of forward function but from right. If the below derivation doesn’t make sense to you don’t worry it’s definitely OK, the below derivation is for those who are familiar in calculus.

**Sigmoid gradient **will be a helpful function to compute the gradients of sigmoid which is a(1-a). The formula for backpropagation for our neural network is:

def costFunction(nn_params, X, y, input_layer_size, hidden_layer_size, num_labels, Lambda):

Theta1 = nn_params[:((input_layer_size+1) * hidden_layer_size)].reshape(hidden_layer_size, input_layer_size+1)

Theta2 = nn_params[((input_layer_size+1) * hidden_layer_size):].reshape(num_labels, hidden_layer_size+1)

#Feedforward and Cost Function

m = X.shape[0]

X = np.column_stack((np.ones((m ,1)), X)) #5000 x 401

a2 = sigmoid(X@Theta1.T) #5000 x 25

a2 = np.hstack((np.ones((m, 1)), a2)) #5000 x 26

a3 = sigmoid(a2@Theta2.T) #5000 x 10

y_matrix = np.zeros((m, num_labels)) #5000 x 10

for i in range(1, num_labels+1):

y_matrix[:, i-1][:, np.newaxis] = np.where(y==i, 1, 0)

J = np.sum(np.sum( -y_matrix * np.log(a3) - (1 - y_matrix) * np.log(1 - a3) ))

reg = Lambda/(2*m) * (np.sum(Theta1[:, 1:]**2) + np.sum(Theta2[:, 1:]**2))

J = (1/m) * J

reg_J = J + reg

grad1 = np.zeros((Theta1.shape))

grad2 = np.zeros((Theta2.shape))

for i in range(m):

xi = X[i, :] #1 x 401

a2i = a2[i, :] #1 x 26

a3i = a3[i, :] #1 x 10

d3 = a3i - y_matrix[i, :]

d2 = (Theta2.T @ d3.T) * sigmoidGradient(np.hstack((1, xi @ Theta1.T)))

grad1 = grad1 + d2[1:][:, np.newaxis] @ xi[:, np.newaxis].T

grad2 = grad2 + d3.T[:, np.newaxis] @ a2i[:, np.newaxis].T

grad1 = 1/m * grad1

grad2 = 1/m * grad2

grad1_reg = grad1 + Lambda/m * np.hstack((np.zeros((Theta1.shape[0], 1)), Theta1[:, 1:]))

grad2_reg = grad2 + Lambda/m * np.hstack((np.zeros((Theta2.shape[0], 1)), Theta2[:, 1:]))

return J, grad1, grad2, reg_J, grad1_reg, grad2_reginput_layer_size = 400

hidden_layer_size = 25

num_labels = 10

nn_params = np.append(Theta1.flatten(), Theta2.flatten())

J, reg_J = costFunction(nn_params, X, y, input_layer_size, hidden_layer_size, num_labels, 1)[0:4:3]

print(f"Cost at parameters(non-regularized): {J}\nCost at parameters(Regularized): {reg_J}")

## Random Initialization

In neural networks we should not initialize θ’s as zeros which makes our neural network symmetry(i.e., every unit detects the same features), when we multiply our input with θ(which is zero) we will always get zeros as output. So, to break symmetry(i.e., every unit should detect different feature like edges, horizontal lines, etc.,) we initialize θ’s randomly. One effective strategy for random initialization is to randomly select values for θ uniformly in the range[-ϵᵢₙᵢₜ,ϵᵢₙᵢₜ](where ϵᵢₙᵢₜ=0.12).

def randomInitailization(L_in, L_out):

epi = np.sqrt(6)/np.sqrt(L_in+L_out)

W = np.random.rand(L_out, L_in+1) * 2*epi - epi

return W

initial_Theta1 = randomInitailization(input_layer_size, hidden_layer_size)

initial_Theta2 = randomInitailization(hidden_layer_size, num_labels)

initial_nn_params = np.append(initial_Theta1.flatten(), initial_Theta2.flatten())

## Gradient Descent

Since we have θ₁ and θ₂ to learn, gradient descent algorithm will have a slight difference the previous ones.

def gradientDescent(initial_nn_params, X, y, input_layer_size, hidden_layer_size, num_labels, alpha, num_iters, Lambda):

Theta1 = initial_nn_params[:((input_layer_size+1) * hidden_layer_size)].reshape(hidden_layer_size, input_layer_size+1)

Theta2 = initial_nn_params[((input_layer_size+1) * hidden_layer_size):].reshape(num_labels, hidden_layer_size+1)

m = len(y)

J_history = []

for i in range(num_iters):

nn_params = np.append(Theta1.flatten(), Theta2.flatten())

cost, grad1, grad2 = costFunction(nn_params, X, y, input_layer_size, hidden_layer_size, num_labels, Lambda)[3:]

Theta1 = Theta1 - (alpha * grad1)

Theta2 = Theta2 - (alpha * grad2)

J_history.append(cost)

nn_params_final = np.append(Theta1.flatten(), Theta2.flatten())

return nn_params_final, J_historynn_params, J_history = gradientDescent(initial_nn_params, X, y, input_layer_size, hidden_layer_size, num_labels, 0.8, 800, 1)

Theta1 = nn_params[:((input_layer_size+1) * hidden_layer_size)].reshape(hidden_layer_size, input_layer_size+1)

Theta2 = nn_params[((input_layer_size+1) * hidden_layer_size):].reshape(num_labels, hidden_layer_size+1)

## Predictions

We can get predictions by doing forward propagation once.

def predict(Theta1, Theta2, X):

m = X.shape[0]

X = np.hstack((np.ones((m, 1)), X))

a2 = sigmoid(X @ Theta1.T)

a2 = np.hstack((np.ones((m, 1)), a2))

a3 = sigmoid(a2 @ Theta2.T)

return np.argmax(a3, axis=1)+1pred = predict(Theta1, Theta2, X)

print(f"Accuracy = {np.mean(pred[:, np.newaxis]==y) * 100}%")

It will show the accuracy around 95%. It is good to classify handwritten digits.

# Conclusion

Today, we saw under the hood of Neural Networks and how it actually works. Then it was created from scratch using python’s numpy, pandas and matplotlib. The dataset and final code is uploaded in github.

Check it out here Neural Networks.

# If you like this post, then check out my other posts in this series about

## 1. What is Machine Learning?

## 2. What are the Types of Machine Learning?

## 3. Uni-Variate Linear Regression

## 4. Multi-Variate Linear Regression

## 5. Logistic Regression

## 6. What are Neural Networks?

## 7. Image Compressing with K-means Clustering

## 8. Dimensionality Reduction on Face using PCA

## 9. Detect Failing Servers on a Network using Anomaly Detection

# Last Thing

*If you enjoyed my article, a clap 👏 and a follow would be ⚡neuralistic⚡ and *it is helpful for medium to promote this article so that others may read it.* I am Jagajith and I will catch you in the next one.*