Perceptrons

Foundations of Neural Networks

Published in

Analytics Vidhya

5 min readOct 11, 2021

Both Perceptrons and sigmoid neurons are units that takes in some inputs, does some calculation and provides an output. These are used typically for Supervised Learning problems.

Perceptron

Let’s say we have some Inputs (x₁,x₂,….,xₙ). On feeding these inputs to a perceptron, each input is multiplied with a random initial weight that is assigned to it and the result is summed. If that weighted sum is greater than a particular threshold value, the perceptron outputs 1, else it outputs 0.

Initially, we do not know the weights and the threshold value (also known as bias). We are going to be using some form a learning algorithm in order to learn these weights and the threshold value, from the data. We will discuss this algorithm later. Right now, we will be initialising the weights and biases randomly.

fig 1.3: A Simple Perceptron Architecture

Just to summarise what we have seen before, A perceptron calculates the weighted sum of the inputs and then passes to into an activation function (step function or threshold function in this case) and then gives the ouptut.
To put it in a much simpler term, the 2 parts of the perceptron are weighted sum and activation function.

Step Function

The step function is a function that returns 1 if the input is ≥ 0 and 0 if the input is < 0.

Perceptron example Use-Case

Let’s say we want to predict if John plays football on a particular day or not. There are 2 factors that we take into account. The weather and number of people available. So, x₁ be “is it raining ?” and x₂ be “number of people available”.
based on the data available, we come to know that John won’t play football if it is raining and if the number of people is very low because we all know that football can’t be played with very few number of people. Join will play football if the number of people is high. but, even if the number of people is very high, John won’t play football if it is raining.

x0 is always 1, x1 is “is it raining ?” ( 0 means ‘no’ and 1 means ‘yes’), x2 is the “number of people available”. w₀ , w₁ and w₂ are the weights associated with x₀, x₁ and x₂ respectively.

fig 3.1: perceptron model that predicts whether john will play football or not

So, Let’s say these are the weights and biases that we get after training the perceptron with the help of the dataset that contains historical data of john playing football.

On a particular sunny day, let’s say 5 people are there to play football. i.e, x2=5. So, the input of our perceptron would be [“no rain”,5]. hence, weighted sum will be (1*-7)+(0*-30)+(5*1) = -7 + 5 = -2. As -2 ≤ 0, our perceptron outputs 0 ( john won’t play football ).
On a particular sunny day, let’s say 11 people are there to play football. i.e, x2=11 . So, the input of our perceptron would be [“no rain”,11]. hence, weighted sum will be (1*-7)+(0*-30)+(11*1) = -7 + 11= 4. As ≥0, our perceptron outputs 1 ( john will play football ).
On a rainy day, let’s say 9 people are there to play football. i.e, x2=9 . So, the input of our perceptron would be [“no rain”,9]. hence, weighted sum will be (1*-7)+(1*-30)+(9*1) = -7 +9–30= -28. As -28≤0, our perceptron outputs 0 ( john will not play football ).

Our perceptron gave more weights to weather because it knows that weather carries more importance compared to the number of people. when, weather is rainy, the number of people available doesn’t matter.

Here, I have manually given the weights. But, with a dataset consisting of X and Y, our perceptron would be able to learn these weights automatically, in order to map X and Y in the best possible way, with a learning algorithm about which we will see now.

Update Weights

As already said, we initialise the weights of the perceptron with random values and given the dataset, we are going to ask the perceptron to learn the appropriate weights that would be allow the perceptron the best classify the results and improve the accuracy.

Now, I am not going to go too much into the technical details of the algorithm. Rather, Let me give a gist of it.

we randomly initialise the weights.
we give in our inputs to the perceptron
The input is multiplied with it’s respective weight and a bias is added to the overall weighted sum. That goes through an activation function to produce 0 or 1.
We calculate the loss with the help of a cost function which basically is an estimate of how our much perceptron’s output deviates from the actual output
we find the partial derivatives of the loss function with respect to each and every weight and that is called the gradient of the weights. we update the weight by subtracting a small amount of the gradient of the weight with that respected weight.

To understand why moving the weights in the direction opposite of it’s gradient with respect to the defined cost function reduces the error and results in optimal weights, one needs to understand the gradient descent algorithm. Here is a simple explanation for the gradient descent algorithm.

If there still seems to be some doubt regarding the concept, refer to the following video

Conclusion

Hence, we have seen how inputs are propagated through a perceptron and how they can be used to make simple decisions. However, there are also some issues with the perceptron algorithm, which the sigmoid neuron overcomes. This will be discussed further in the upcoming blogs.