How to Create a Simple Neural Network in Python
Learn how to create a neural network and teach it to classify vectors
Machine learning has had a huge impact on the world over the last few decades, and its popularity seems to be ever-growing. Recently, more and more people have familiarised themselves with machine learning subfields, like neural networks, which are networks inspired by the human brain. In this article, Python code for a simple neural network that classifies 1x3 vectors with 10 as the first element, will be presented.
Step 1: Import NumPy, Scikit-learn and Matplotlib
import numpy as np
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt
We will be using three packages for this project. NumPy will be used for creating vectors and matrices, as well as mathematical operations. Scikit-learn will be used for scaling the data, and Matplotlib will be used for plotting the error development during the training of the neural network.
Step 2: Create a Training and Test Data Set
Neural networks are great at learning trends in both large and small data sets. However, data scientists have to be aware of the dangers of overfitting, which are more evident in projects where small data sets are used. Overfitting is when an algorithm is trained and modeled to fit a set of data points too closely so that it does not generalize well to new data points.
Often, overfitting machine learning models have very high accuracy on the data sets they are trained on, but as a data scientist, the goal is usually to predict new data points as precisely as possible. To make sure that the model is evaluated based on how good it is to predict new data points, and not how well it is modeled to the current ones, it is common to split the datasets into one training set and one test set (and sometimes a validation set).
input_train = np.array([[0, 1, 0], [0, 1, 1], [0, 0, 0],
[10, 0, 0], [10, 1, 1], [10, 0, 1]])
output_train = np.array([, , , , , ])
input_pred = np.array([1, 1, 0])
input_test = np.array([[1, 1, 1], [10, 0, 1], [0, 1, 10],
[10, 1, 10], [0, 0, 0], [0, 1, 1]])
output_test = np.array([, , , , , ])
In this simple neural network, we will classify 1x3 vectors with 10 as the first element. Input and output training and test sets are created using NumPy’s
array function, and
input_pred is created to test a
prediction function that will be defined later. Both the training and the test data are comprised of six samples with three features each, and since the output is given, we understand that this is an example of supervised learning.
Step 3: Scale the Data
Many machine learning models are not able to understand the difference between e.g. units, and will naturally apply more weight to features of high magnitudes. This can destroy an algorithm’s ability to predict new data points well. Further, training machine learning models with features of high magnitude will be slower than necessary, at least if gradient descent is used. This is because gradient descent converges faster when the input values are in approximately the same range.
scaler = MinMaxScaler()
input_train_scaled = scaler.fit_transform(input_train)
output_train_scaled = scaler.fit_transform(output_train)
input_test_scaled = scaler.fit_transform(input_test)
output_test_scaled = scaler.fit_transform(output_test)
In our training and test data sets the values are in a relatively small range, and it might therefore not be necessary to do feature scaling. It is, however, included here so that people can use their own numbers without changing too much of the code. Doing feature scaling is extremely easy in Python due to the Scikit-learn package, and its MinMaxScaler class. Simply create a MinMaxScaler object, and use the
fit_transform function with your non-scaled data as input, and the function will return the same data scaled. There are also other scaling functions in the Scikit-learn package that I encourage you to try.
Step 4: Create a Neural Network Class
One of the easiest ways to get familiar with all the elements of a neural network is to create a neural network class. Such a class should include all the variables and functions that will be necessary for the neural network to work properly.
Step 4.1: Create an Initialize Function
__init__ function is called on when we create a class in Python so that the variables are initialized properly.
In the example, I have chosen a neural network with three input nodes, three nodes in the hidden layer, and one output node. The above
__init__ function initializes variables describing the size of the neural network.
inputSize is the number of input nodes, which should be equal to the number of features in our input data.
outputSize is equal to the number of output nodes, and
hiddenSize describes the number of nodes in the hidden layer. Further, there will be weights between the different nodes in our network that will be adjusted during training.
In addition to the variables describing the size of the neural network and its weights, I have created several variables that are initialized during the creation of a
NeuralNetwork object that will be used for evaluation purposes. The
error_list will contain the mean absolute error (MAE) for each of the epochs, and the limit will describe the boundary for when a vector should be classified as a vector with element 10 as the first element and not. Then, there are variables that will be used to store the number of true positives, false positives, true negatives, and false negatives.
Step 4.2: Create a Forward Propagation Function
The purpose of the forward pass function is to iterate forward through the different layers of the neural network to predict output for that particular epoch. Then, looking at the difference between the predicted output and the actual output, the weights will be updated during backward propagation.
def forward(self, X):
self.z = np.matmul(X, self.W1)
self.z2 = self.sigmoid(self.z)
self.z3 = np.matmul(self.z2, self.W2)
o = self.sigmoid(self.z3)
To calculate the values at each node in every layer, the values at the nodes in the previous layer will be matrix multiplied with the applicable weights before a non-linear activation function will be applied to widen the possibilities for the final output function. In this example, we have chosen the Sigmoid as the activation function, but there are also many other alternatives.
Step 4.3: Create a Backward Propagation Function
Backpropagation is the process that updates the weights for the different nodes in the neural network and hence decides their importance.
def backward(self, X, y, o):
self.o_error = y - o
self.o_delta = self.o_error * self.sigmoidPrime(o)
self.z2_error = np.matmul(self.o_delta,
self.z2_delta = self.z2_error * self.sigmoidPrime(self.z2)
self.W1 += np.matmul(np.matrix.transpose(X), self.z2_delta)
self.W2 += np.matmul(np.matrix.transpose(self.z2),
In the above code snippet, the output error from the output layer is calculated as the difference between the predicted output from forwarding propagation and the actual output. Then, this error is multiplied with the Sigmoid prime in order to run gradient descent, before the entire process is repeated until the input layer is reached. Finally, the weights between the different layers are updated.
Step 4.4: Create a Training Function
During training, the algorithm will run forward and backward pass and thereby updating the weights as many times as there are epochs. This is necessary in order to end up with the most precise weights.
def train(self, X, y, epochs):
for epoch in range(epochs):
o = self.forward(X)
self.backward(X, y, o)
In addition to running forward and backward pass, we save the mean absolute error (MAE) to an error list so that we can later observe how the mean absolute error develops during the course of the training.
Step 4.5: Create a Prediction Function
After the weights are fine-tuned during training, the algorithm is ready to predict the output for new data points. This is done through a single iteration of forwarding pass. The predicted output will be a number that hopefully will be quite close to the actual output.
def predict(self, x_predicted):
Step 4.6: Plot the Mean Absolute Error Development
There are many ways to evaluate the quality of a machine learning algorithm. One of the measures that are often used is the mean absolute error, and this should decrease with the number of epochs.
plt.title('Mean Sum Squared Loss')
Step 4.7: Calculate the Accuracy and its Components
The number of true positives, false positives, true negatives, and false negatives describes the quality of a machine learning classification algorithm. After training the neural network, the weights should be updated so that the algorithm is able to accurately predict new data points. In binary classification tasks, these new data points can only be 1 or 0. Depending on whether the predicted value is above or below the defined limit, the algorithm will classify the new entry as 1 or 0.
When running the test_evaluation function, we get the following results:
True positives: 2
True negatives: 4
False positives: 0
False negatives: 0
Accuracy is given by the following formula:
As we can infer from this, the accuracy in our case is 1.
Step 5: Run a Script That Trains and Evaluate the Neural Network Model
NN = NeuralNetwork()
NN.train(input_train_scaled, output_train_scaled, 200)
In order to try out the neural network class that we have just built, we will start by initializing an object of the type
NeuralNetwork. The neural network is then trained on the training data to finetune the weights of the algorithm over 200 epochs before the newly trained model is tested on a test vector. Then, the error development is plotted, before the model is evaluated using the test data sets.
See the entire project and code on GitHub.
Step 6: Improve the Script and Play With It
The presented code can easily be modified to handle other similar situations. The reader is encouraged to play around with it, change variables, and use their own data among other things. Potential ideas for improvement or changes include:
- Generalize the code to work for data of any input and output size
- Use another metric than mean absolute error to monitor error development
- Use another scaling function