Building a Neural Network from Scratch using Numpy and Math Libraries: A Step-by-Step Tutorial In Python

Waleed Mousa
7 min readMar 15, 2023

--

Neural networks are powerful machine learning models that can be used for a variety of tasks such as image classification, speech recognition, and natural language processing.

While there are many high-level libraries available that make it easy to build and train neural networks, it’s important to have a fundamental understanding of the underlying concepts and mathematics behind them.

In this tutorial, we will build a neural network from scratch using only the numpy and math libraries in Python. We will cover the key concepts of neural networks, including forward and backward propagation, activation functions, and loss functions. We will implement each step of the neural network construction process, including initialization, training, and prediction.

By the end of this tutorial, you will have a solid understanding of how neural networks work and how to implement them from scratch in Python. You will also gain valuable experience working with numpy and math libraries, which are essential tools for data science and machine learning.

1. Agenda

Here is the agenda for building our neural network model:

  • Load and prepare the dataset
  • Define the neural network architecture
  • Initialize the weights and biases
  • Define the activation function(s)
  • Implement the forward propagation algorithm
  • Define the loss function
  • Implement the backward propagation algorithm
  • Train the neural network
  • Evaluate the model performance on test data

Let’s dive into each step in more detail.

Step 1: Load and prepare the dataset

The first step is to load the dataset and prepare it for training. For this tutorial, we will use a simple toy dataset with two features and two classes. You can create this dataset using the following code:

import numpy as np

np.random.seed(0)

# create a toy dataset
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

Here, X is the input data with four samples, each having two features, and y is the output data with four labels.

Step 2: Define the neural network architecture

Next, we need to define the architecture of our neural network. For this tutorial, we will create a simple feedforward neural network with one hidden layer containing three neurons.

The input layer will have two neurons, and the output layer will have one neuron. The architecture can be represented as follows:

     Input Layer (2 neurons)
||
\/
Hidden Layer (3 neurons)
||
\/
Output Layer (1 neuron)

Step 3: Initialize the weights and biases

To train the neural network, we need to initialize the weights and biases of each layer. We will randomly initialize the weights and biases using Numpy’s random module.

# initialize weights and biases
def initialize_parameters(input_size, hidden_size, output_size):
np.random.seed(0)
W1 = np.random.randn(hidden_size, input_size) * 0.01
b1 = np.zeros((hidden_size, 1))
W2 = np.random.randn(output_size, hidden_size) * 0.01
b2 = np.zeros((output_size, 1))
parameters = {"W1": W1, "b1": b1, "W2": W2, "b2": b2}
return parameters

parameters = initialize_parameters(2, 3, 1)

Here, W1 and W2 are the weight matrices of the hidden and output layers, respectively. b1 and b2 are the bias vectors of the hidden and output layers, respectively. The input_size, hidden_size, and output_size parameters are the number of neurons in the input, hidden, and output layers, respectively.

Step 4: Define the activation function(s)

The activation function is used to introduce non-linearity into the neural network. For this tutorial, we will use the sigmoid activation function.

# sigmoid activation function
def sigmoid(x):
return 1 / (1 + np.exp(-x))

Step 5: Implement the forward propagation algorithm

The forward propagation algorithm is used to compute the output of the neural network given an input.

It involves computing the weighted sum of the inputs and the biases, passing the result through the activation function, and repeating this process for each layer until the output is obtained.

# forward propagation
def forward_propagation(X, parameters):
# retrieve the parameters
W1, b1, W2, b2 = parameters

# compute the activation of the hidden layer
Z1 = np.dot(W1, X.T) + b1
A1 = sigmoid(Z1)

# compute the activation of the output layer
Z2 = np.dot(W2, A1) + b2
A2 = sigmoid(Z2)

cache = {"Z1": Z1, "A1": A1, "Z2": Z2, "A2": A2}

return A2, cache

A2, cache = forward_propagation(X, parameters)

Here, X is the input data, and parameters are the weights and biases of the neural network. The forward_propagation function returns the output of the neural network (A2) and a cache containing the intermediate values (Z1, A1, Z2, A2) that will be needed for the backward propagation algorithm.

Step 6: Define the loss function

The loss function is used to measure the error of the neural network’s predictions. For this tutorial, we will use the binary cross-entropy loss function.

# binary cross-entropy loss function
def binary_cross_entropy_loss(A2, y):
m = y.shape[0]
loss = -(1/m) * np.sum(y*np.log(A2) + (1-y)*np.log(1-A2))
return loss

Here, A2 is the output of the neural network, and y is the true label.

Step 7: Implement the backward propagation algorithm

The backward propagation algorithm is used to compute the gradients of the loss function with respect to the weights and biases of the neural network. .

It involves computing the error of the output layer, propagating the error backward to the hidden layer, and computing the gradients using the chain rule.

# backward propagation
def backward_propagation(parameters, cache, X, y):
m = y.shape[0]

# retrieve the intermediate values
Z1 = cache["Z1"]
A1 = cache["A1"]
Z2 = cache["Z2"]
A2 = cache["A2"]

# compute the derivative of the loss with respect to A2
dA2 = - (y/A2) + ((1-y)/(1-A2))

# compute the derivative of the activation function of the output layer
dZ2 = dA2 * (A2 * (1-A2))

# compute the derivative of the weights and biases of the output layer
dW2 = (1/m) * np.dot(dZ2, A1.T)
db2 = (1/m) * np.sum(dZ2, axis=1, keepdims=True)

# compute the derivative of the activation function of the hidden layer
dA1 = np.dot(parameters["W2"].T, dZ2)
dZ1 = dA1 * (A1 * (1-A1))

# compute the derivative of the weights and biases of the hidden layer
dW1 = (1/m) * np.dot(dZ1, X)
db1 = (1/m) * np.sum(dZ1, axis=1, keepdims=True)
gradients = {"dW1": dW1, "db1": db1, "dW2": dW2, "db2": db2}

return gradients

Here, `parameters` are the weights and biases of the neural network, `cache` contains the intermediate values computed during the forward propagation, `X` is the input data, and `y` is the true label.

The `backward_propagation` function returns the gradients of the loss function with respect to the weights and biases of the neural network.

Step 8: Implement the update parameters function

The update parameters function is used to update the weights and biases of the neural network using the gradients computed during the backward propagation.

# update parameters
def update_parameters(parameters, gradients, learning_rate):
# retrieve the gradients
dW1 = gradients["dW1"]
db1 = gradients["db1"]
dW2 = gradients["dW2"]
db2 = gradients["db2"]

# retrieve the weights and biases
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]

# update the weights and biases
W1 = W1 - learning_rate*dW1
b1 = b1 - learning_rate*db1
W2 = W2 - learning_rate*dW2
b2 = b2 - learning_rate*db2

parameters = {"W1": W1, "b1": b1, "W2": W2, "b2": b2}

return parameters

Here, parameters are the weights and biases of the neural network, gradients are the gradients of the loss function with respect to the weights and biases of the neural network, and learning_rate is the learning rate used to update the weights and biases. The update_parameters function returns the updated weights and biases.

Step 9: Train the neural network

Now that we have implemented all the necessary functions, we can train the neural network using the training data.

The training process involves repeatedly performing forward propagation, backward propagation, and updating the parameters until the loss function is minimized.

# train the neural network
def train(X, y, hidden_layer_size, num_iterations, learning_rate):
# initialize the weights and biases
parameters = initialize_parameters(X.shape[1], hidden_layer_size, 1)

for i in range(num_iterations):
# forward propagation
A2, cache = forward_propagation(X, parameters)

# compute the loss
loss = binary_cross_entropy_loss(A2, y)

# backward propagation
gradients = backward_propagation(parameters, cache, X, y)

# update the parameters
parameters = update_parameters(parameters, gradients, learning_rate)

if i % 1000 == 0:
print(f"iteration {i}: loss = {loss}")

return parameters

parameters = train(X, y, hidden_layer_size=4, num_iterations=10000, learning_rate=0.1)

Here, X is the input data, y is the true label, hidden_layer_size is the number of neurons in the hidden layer, num_iterations is the number of iterations to perform, and learning_rate is the learning rate used to update the weights and biases. The train function returns the trained weights and biases.

Step 10: Make predictions

Finally, we can use the trained neural network to make predictions on new data.

The predict function performs forward propagation on the input data using the trained weights and biases and returns the predicted labels.

# predict the labels for new data
def predict(X, parameters):
A2, _ = forward_propagation(X, parameters)
predictions = (A2 > 0.5).astype(int)
return predictions

predictions = predict(X_test, parameters)

Here, X is the input data, parameters are the trained weights and biases, and X_test is the new data for which we want to make predictions. The predict function returns the predicted labels.

Summary

In this tutorial, we have built a neural network from scratch using only numpy and math libraries. We have implemented the following functions:

  • initialize_parameters function to initialize the weights and biases
  • sigmoid function to apply the sigmoid activation function
  • forward_propagation function to perform forward propagation
  • binary_cross_entropy_loss function to compute the loss function
  • backward_propagation function to perform backward propagation
  • update_parameters function to update the weights and biases
  • train function to train the neural network
  • predict function to make predictions on new data

We have also discussed the mathematical concepts behind the neural network, such as the forward and backward propagation, activation functions, and loss function.

The neural network is trained on the training data by minimizing the loss function and adjusting the weights and biases using the gradients computed during the backward propagation. Finally, the trained neural network is used to make predictions on new data.

More Content:

Machine Learning & AI

40 stories

--

--