Creating a Single Neuron Model(Perceptron) from Scratch in Python

Understanding Deep Learning from Scratch

Akil Ahmed
ITNEXT

--

You might have used or heard about neural networks before, today in this blog we will create a Single Neuron Model or Perceptron from scratch which can also be called Logistic Regression a classification algorithm used in the case of categorical data. All the codes can be found in this GitHub link inside perceptron.ipynb notebook and the dataset can be found here.

So let’s create our cute little perceptron

At first, we need to have our data ready in the required format, it can only use data in vector format or single-dimensional data but the shape of our image is (32, 32, 3) where the first two are the height and width of the image, and 3 is the RGB channel.

  • Here In the code block above at first, we are importing the required modules
  • Then we are loading our data from our local drive and splitting the data with a 4:1 ratio.
  • Then we are reshaping our image data from multi-dimensional matrix format to vector format.
  • At last, we are standardizing our dataset so the pixels values stay in-between 0 to 1.

Now once our data is ready let’s understand the architecture of our model, which you can see in the image below.

At first, we have our data(X) in vector format that will do dot product with our weight vector(W) added with bias(b), in the equation below (i) denotes the ith element of X or Z, and Z being the vector of the results of W, X and b.

So, Does that mean Z is the result of our perceptron. Well, Not so fast!

The values are Z vector does not have any upper or lower limit and we need a value between 0 and 1, so to do that we will use the sigmoid function which will take the values of Z and bring them in between 0 and 1 which is denoted as A vector.

So now, you may ask how are we deciding the values of W and b? Well, we are initializing all the values of the vector elements of W as 0.0 and b as scalar value 0.0.

The whole process till now is called Forward Propagation, Here is the code for all the steps.

  • At first, we are defining a function initialize_with_zeros which takes dimension as a parameter and will return W vector and b, remember the bias term(b) is a scalar value.
  • Then, We are defining the sigmoid function.
  • At last, the forward_propagation function will give us the A vector with values bound from 0 to 1.

Now, once we are done with the Forward Propagation step we have to go back and update the values of W and b so that it can distinguish between the two classes properly. And for that, there is something called the Cost Function(J) which gives the cost of prediction, which means how far off the predicted values are from the actual values. So to calculate that, we have to find the cost of every individual prediction, here y suffix (i) is the actual value and a suffix (i) is the predicted value of ith data. Please refer to this link to understand the Cost Function in detail and how this expression came to be.

Sum it over all the values in the dataset.

Take the mean, where m is the number of training data present. This is the cost function, and we have to minimize this cost function as much as possible.

To minimize the Cost Function or J we have to find the derivative J with respect to W and b, which means the amount of change that is required for W and b to differentiate between two classes.

So the derivative of W(dw) and b(db) w.r.t J is given below, dw and db are called gradients of J. If you want to understand how it came into the picture then refer to this video. All the steps after Forward Propagation till now are called Backward Propagation.

Now, let’s see the code till now. We have removed the forward_propagation function and merged it with the Backward Propagation code, let's call it propagate function.

  • We have explained till line-12 before, now at first, we are calculating the cost according to the previously mentioned expression.
  • Then, we are calculating the gradients of cost which are dw and db
  • At last, return the gradients and cost.

Now in the next steps, we have to iteratively decrease the Cost Function and to do that we have to subtract the derivative of W(dw) and b(db) very slowly in every iteration but the value of dw and db is quite large and subtracting their values directly will not work because of overshooting, as a solution we multiply dw and db with a very small number which is called the Learning Rate. Here l is the learning rate and “:=” is the assignment operator.

This gradual change in the value of W and b is called Gradient Descent. In the pic below the football is denoting the values of W and b.

image source

Now, Let’s understand the Gradient Descent code.

  • In the function, we are passing our w(weights), b(bias), X(training set), Y(actual values for the training set), num_iteration(number of iterations) and learning_rate(same). The training and test set we have created in our very first code block.
  • From the propagate function created above it gets dw, db and cost.
  • Then we are updating w and b according to our previous explanation. In the next iteration, the value of w and b will be the new value.
  • Then we are printing the cost every 100 iterations and returned the w and b. Please understand that although the w is in the small letters it is a vector and b is scalar.

So, we are in the endgame now, let’s first create a predict function that will predict(obviously) the class of our data point, You can see here if the resultant value from the sigmoid function is less than 0.5 then it will be considered as 0 otherwise 1.

Let’s assemble all the functions till now to create the model function.

This code is pretty self-explanatory, Initializing the values of w and b with zeros, then after Gradient Descent, we are printing the accuracy in train and test set.

Let’s see how our sweet little model performs. Run the code given below.

logistic_regression_model = model(train_set_x, y_train, test_set_x, y_test, num_iterations=2000, learning_rate=0.001)`

This is the result you might have gotten after running it.

Cost after iteration 0: 0.693147 
Cost after iteration 100: 0.679755
Cost after iteration 200: 0.673041
Cost after iteration 300: 0.668556
Cost after iteration 400: 0.665001
Cost after iteration 500: 0.661952
Cost after iteration 600: 0.659240
Cost after iteration 700: 0.656784
Cost after iteration 800: 0.654535
Cost after iteration 900: 0.652458
Cost after iteration 1000: 0.650527
Cost after iteration 1100: 0.648720
Cost after iteration 1200: 0.647020
Cost after iteration 1300: 0.645414
Cost after iteration 1400: 0.643889
Cost after iteration 1500: 0.642436
Cost after iteration 1600: 0.641046
Cost after iteration 1700: 0.639713
Cost after iteration 1800: 0.638430
Cost after iteration 1900: 0.637192
train accuracy: 63.25 %
test accuracy: 58.5 %

So, our accuracy on the test set is 58.5% which is better than random guesses (50%) but obviously not up to the mark. So, now let’s think about what can we change to make it better, we could have used a different activation function, and a different learning rate could have trained it for maybe more or less iteration. Please try out different ideas you have and see if you can change the result. We have created a Neural Network model from scratch in the second article of this series

Credit: https://www.coursera.org/specializations/deep-learning

--

--