McCulloch-Pitts Neuron, Perceptron and the Perceptron Learning Algorithm

Shikhar Goswami
Analytics Vidhya
Published in
5 min readJun 19, 2020

Our brain has approximately 86 billion neurons. Neurons help us to make decisions . We are repeatedly making decisions- “Whether I should watch Netflix or study?”, “Whether I should watch Dark or Friends?, “Whether to watch an IPL match or not”. Our decision are often of the boolean form- Yes or No, 0 or 1. We also take into account various factors(often boolean as well) before arriving at a decision. So, for example, our factors for watching an IPL match would be- Whether it is a CSK match or not, if it is then I’ll watch. Whether the opponent team is Royal Challengers Banglore or not, if it’s not, I’ll watch” etc. So, this inspiration was taken to form the first mathematical representation of biological neurons- MP neuron. Let’s see how it works:

As it is shown, both the inputs and ouputs are boolean. There are two parts: g takes all the inputs and aggregates them, f makes a decision based on the aggreagation. Let’s continue with the example of IPL matches. So,

y : could be { 1: I’ll watch the match, 0: I’ll not watch the match}

x1: could be { Is the match happening in my city}

x2 : { Is CSK playing }

x3 : { Is MS Dhoni playing }

It’s obvious that I can only make my decision of going or not if x1 is 1, i.e match has to be in my city for me to go. I won’t be able to go to the match in another city( for all practical purposes). x1 is called the inhibitory input for this reason. In general, x1 is all the reasons of not going to the match .

Now, if you’re a die hard fan of MS Dhoni- You will go only if MS Dhoni is playing- all x1, x2, x3 have to be 1. On the flip-side, if you’re just a CSK fan, only x1 and x2 have to be 1- no matter x3 is 0 or 1. Taking the first case, we can say that this neuron will “fire” only if g(x)=x1+x2+x3≥3

Let’s take another boolean function-AND function. To represent it by the MP-neuron:

We can say the neuron will “fire” only if g(x)≥2. Therefore, g(x)=2 is called decision boundary and for all the points on or above the line, y=1. In general, g(x)≥θ, where θ = Thresholding parameter.

In this way, some of the boolean functions can be represented by MP-neuron. But what about non-boolean inputs? Are all the inputs equal- In our case, I may want to give more “importance” to x3. “Here comes the megatron!”- No, Perceptron

Perceptron

In real life, some factors are more important than others.So, in addition to inputs in MP-neuron, the weights(importance) of the inputs are also given in Perceptron. And also, in addition to learning weights ,the threshold that we were hand-coding before will be now be learned by an algorithm known as Perceptron learning algorithm.

Now, the aggregation will become g(x) = w1x1+w2x2+w3x3≥θ or w0x0+w1x1+w2x2+w3x3≥0 where x0=1 and w0=-θ. We can also write its as w.x≥0 , dot product. The decision boundary will be w.x=0. The task is to learn weights wi that minimizes the error on inputs xi.

Perceptron Learning Algorithm

Let’s iniatialize weights such that:

Case1: P={p1,p2,p3} are all the inputs for which y=1. In other words, taking decision boundary as w.x=0, P is a set of all the points for which w.x≥0.

Case2: N={n1,n2,n3} are all the inputs for which y=0. In other words, N is a set of all the points for which w.x≤0.

Now, we want that the at the end, Case1 and Case2 must happen( and not the other way). Also, note that w.x≥0 means that α≤90 degrees, angle b/w w and x. And, w.x≤0 means α≥90 degrees.

With all the declarations with us, let’s see the algorithm:

We iterate over all the points in input space (P U N) and for points in P, we want w.x≥0 at the end. Therefore, whenever w.x<0, we add x vector to w to make it w.x≥0. We do the same for points in N.

Therefore when the algorithm has converged- when for all the points in P, w.x≥0 and for all the points in N, w.x<0, we stop.

The algorithm will always converge. Proof of convergence can be found here.

This article was only upto a single perceptron. A single perceptron can only represent linearly separable functions. A network of perceptron will be required to classify linearly inseparable functions. In the next post!

Note: The concept of this article was based on videos of course CS7015: Deep Learning taught at NPTEL Online.

--

--