Understanding BackPropagation by solving X-NOR Gate Problem
Implementing the X-NOR logical problem in Python from Scratch to help understand Neural Networks and the BackPropagation Algorithm.
<Hello 🌍>
This article has been written as a part of the MSP Developer Stories initiative by the Microsoft Student Partners(India) program. ❤
Check out the amazing ✨Microsoft Student Partners community here, https://studentambassadors.microsoft.com/. 😄
So, What is a Neural Network?
Neural networks are a set of algorithms, modelled loosely after the human brain, that are designed to recognize patterns. That’s it. Some nodes or say neurons connected to each other which pass information like the ones in the brain do. Here each neuron processes the info and passes on to the next one.
The data is processed as a function and the output sent to the next neuron with increasing complexity until in the last node we can determine what kind of output the data is( which is our prediction), so we start finding patterns and features.
How do we train a Neural Net ?
When training a network, we repeat the following steps (for n epochs*):
- Perform forward pass.
- Calculate ∆W (the delta to be added to each weight) via backpropagation.
- Update the weights.
*an epoch is one iteration of the algorithm on the entire dataset.
What is BackPropagation?
In machine learning, backpropagation (backprop) is a widely used algorithm in training feedforward neural networks for supervised learning. (Backprop as defined by Wikipedia; more at Wikipedia)
Well, in other words, the backprop algorithm works by calculating the gradient of the loss function with respect to the weights and bias. Now this gradient is computed using the ‘chain rule’ of calculus, calculating the gradient one layer at a time, going backwards (as the gradient of the current layer depends upon factors pertinent to the previous layer such as past weight and input)from the last layer to the first.
The X-NOR Problem in Neural Networks
It is the problem of using a neural network (an ANN) to predict the output of X-NOR gate. It is similar to the very famous XOR Problem, which was very difficult for Perceptrons to solve as shown in the paper by Minksy and Papert .
Well, X-NOR problem says that given two inputs (binary inputs 0/1), an X-NOR function should return a value of true or false (depending on if the two inputs are same and should give true as output or different, should output false). The truth table for XNOR is shown below, and it is what has to be modelled:
And here’s the graphical representation for the same :
Solving X-NOR using NN
Our neural network needs to classify the input patterns according to the given truth table. Now, the neural network has to be modelled to separate these input patterns, but they are not linearly separable, hence it will be done using ‘decision planes’, which will be modelled by using 2 hidden layer nodes.
CODE:
Now open an Azure Notebook from : https://notebooks.azure.com/ ( Azure Notebook Provides free online access to Jupyter notebooks running in the cloud on Microsoft Azure, so that you can code along on your browser without any IDE installed) or use Azure ML Studio for the same or code along using any offline IDE.
Here’s the link to my code on Azure Notebook : https://notebooks.azure.com/harsh-aryan/projects/x-nor-gate-backprop . Feel free to clone the project and try making changes and tuning hyperparameters.
So first, let us import the necessary libraries in Python:
import numpy as np
import matplotlib.pyplot as plt
Next, we need to define sigmoid activation function:
def sigmoid_func(x):
y=1+np.exp(-x)
y=1/y
return y
Now lets define the algorithmic functions for Forward Propagation :
To do forward prop, we move forward finding output for input layer, and passing it on to as the input for the hidden layer and doing same to get output of output layer.
def fwd_pass(X_training, wt1, wt2):
a1=np.matmul(X_training,wt1)
z1=sigmoid_func(a1)len_z1=len(z1)
b=np.ones((len_z1,1))
z1=np.concatenate((b,z1),axis=1)a2=np.matmul(z1,wt2)
z2=sigmoid_func(a2)return a1,z1,a2,z2
Code for Backpropagation :
For backprop, we need to find the derivative.
We start by finding the loss (difference of predicted output and actual output) and hence finding derivatives, using the chain rule, going backwards.
Then we need to update the weights by subtracting them , that is changing all weights and biases.
def back_propagation(a2, z0, z1, z2, y):
diff2 = z2-y
Derivative2=np.matmul(z1.T,diff2)
diff1=(diff2.dot( w2[1:,:].T ))*sigmoid_func(a1)*(1-sigmoid_func(a1))
Derivative1=np.matmul(z0.T,diff1)return diff2,Derivative1, Derivative2def updateWeights(Derivative1,Derivative2,learning_rate,m,w1,w2):
change_in_w1= learning_rate*(1/m)*Derivative1
w1=w1 — change_in_w1
change_in_w2= learning_rate*(1/m)*Derivative2
w2= w2 — change_in_w2
return w1,w2
See the matrix multiplication happening above so many times, well Neural Networks are computationally expensive.
Back to code :>
So now that we have defined functions to train the algorithm, lets move on to see how to predict values and test our algo.
def predict(X_test,weight1,weight2):
a1,z1,a2,z2=fwd_pass(X_test,weight1,weight2)
return z2def test(X_test,y_test):
y_predicted=predict(X_test,w1,w2)
print(“Test set is :”)
print(X_test[:,1:])print(“\nPredicted values for Test set are”)
print(np.round(y_predicted))print(“\n And actual y values for test set are”)
print(y_test)
Now that our Neural Network is equipped with the functions for training and testing, let's move on to define our X & y, that is our data. So we need to initialize the training data for XNOR that is the truth table for X-NOR.
X = np.array([[1,0,0],
[1,0,1],
[1,1,0 ],
[1,1,1]])y=np.array([[1],
[0],
[0],
[1]])
Now next, we need to initialize our parameters (weights and biases)& hyperparameters(learning rate, no of epochs). Initialize weights randomly and set an appropriate number of epochs and learning rate. Experiment with a few values for No of epochs and Learning rate.
w1=np.random.randn(3,5)
w2=np.random.randn(6,1)learning_rate=0.05costs=[]num_epoch=10000
Now, we’re all set to Train the Neural Network.
Running the Algorithm to train the neural network :
This includes three main steps:
1. Forward pass
2. Back prop (finding the gradients of parameters to be increased/decreased)
3. Increment/Decrement weights calculated above
m=len(X)for i in range(num_epoch):a1,z1,a2,z2=fwd_pass(X,w1,w2)
diff2, Derivative1,Derivative2=back_propagation(a2,X,z1,z2,y)
w1,w2=updateWeights(Derivative1,Derivative2,learning_rate,m,w1,w2)cost_i=np.mean(np.abs(diff2))
costs.append(cost_i)if i == 0 or i==num_epoch-1:
print(“In Iterartion: “+ str(i+1))
print(“the error is :”+str(cost_i)+”\n”)
Running the algo with the above hyperparameters leads to this output being displayed on the screen:
In Iterartion: 1
the error is :0.5010907676790991
In Iterartion: 10000
the error is :0.035972126303652666
Now that the training is done, we just need to test is by predicting output and comparing with actual expected output derived from truth table.
Display the predicted results, Plot cost graph
print(“After the completion of Training :\n”)
z3=predict(X,w1,w2)
print(“Y value predicted: “)
print(np.round(z3))
print(“\n”)plt.plot(costs)
plt.ylabel(‘Error’)
plt.xlabel(‘Epochs’)
plt.show()
Now wait for you NN to train
Running this code, displays on screen something like :
After the completion of Training :
Y value predicted:
[[1.]
[0.]
[0.]
[1.]]
and a graph like:
The graph shows that the error initially was 0.5 which reduced to almost zero. The slope is smooth and not steep, from which we can say that the learning rate chosen is appropriate. The graph converges to zero, hence showing that the number of epochs is enough.
So now, the last step, we just need to test our model :
Testing the network
This is the last step where we verify our model is working properly.
This can be done via a test set, predicting results for that set and matching them with actual known values.
Now there are two possibilities here, we either use the truth table values, that is [0 1], [0 0],[ 1 1], [1 0].
But the model has been trained on this data and overfits it.
Hence we take slightly different values close to 0 or 1 for the test set and run the model on them
X_test = np.array([[1,0,0.03],
[1,0,0.99],
[1,1,0 ],
[1,1,0.2]])y_test = np.array([[1.],
[0.],
[0.],
[0.]])test(X_test,y_test)
which gives us the output:
Test set is :
[[0. 0.03]
[0. 0.99]
[1. 0. ]
[1. 0.2 ]]
Predicted values for Test set are
[[1.]
[0.]
[0.]
[0.]]
And actual y values for test set are
[[1.]
[0.]
[0.]
[0.]]
Since our model was a XNOR Gate, the model should ideally show 0 (low) for values near [1,0] or [0,1] and high value (1) for points near [0,0] and [1,1]
Above we have shown one such test case. It can also be tested on other such test cases and hence verified that the model is correct.
So, cheers !
We’ve implemented the X-NOR gate by writing code for Neural Network, Forward Prop and Backprop from Scratch.
To check out the entire code at once,check out my Azure Notebook : https://notebooks.azure.com/harsh-aryan/projects/x-nor-gate-backprop .
Feel free to clone the project and try making changes and tuning hyperparameters. Also, if you are a student, you sign up to Azure to get free Cloud Credits and services. Also, learn more of such Technologies for free from Microsoft Learn here.
References:
[1] Wikipedia: Backpropagation , [online] available at : https://en.wikipedia.org/wiki/Backpropagation
[2]Minsky, M. Papert, S. (1969). Perceptron: an introduction to computational geometry. The MIT Press, Cambridge, expanded edition, 19(88).
Signing off…
This blog was a part of Learning ML series by the author. Click here to get to the previous part of the series.