A Career in Data Science — Part 2 — Machine Learning — Perceptrons

The Building blocks of every Neural network is a huge network of interconnected perceptrons!

So, if you have been following my blog carefully, this my second article on Machine Learning on Perceptrons. If its a person viewing my blog for the first time here’s the link to my Introductory post, hope you enjoy the content on my blog, will be glad to see you on-board throughout this journey.

Fun Fact :

DeepMind’s AlphaZero Clobbered The Top AI Champions In Go, Shogi, And Chess. Successfully defeating the best human Go player in 2016, AlphaGo was upgraded a year later into a generalized and more powerful incarnation, AlphaZero. It had no prior knowledge of the game and only the basic game rules as input, AlphaZero learned how to play master-level chess by itself in just four hours. It then proceeded to trounce Stockfish (the top AI chess player) in a 100-game match — without losing a single game.

In 3 days, AlphaGo Zero surpasses the AlphaGo Lee, the version that beat Lee Sidol 4 out of 5 games in 2016. In 21 days, AlphaGo Zero reaches the level of AlphaGo Master, the version which defeated 60 top professional online and the World Champion Ke Jie in 3 out of 3 games, 2017. In 40 days AlphaGo Zero surpasses all other versions of AlphaGo, and, arguably becomes the best Go player in the world. It does this entirely from self play, with no human intervention and using no historical data. Quite, a feat to be achieved!

Before we dive into the actual concept of Perceptrons, let me start off with a very simple example :

Assume you are building a social networking platform just like Facebook or Twitter. As a strategy you decide to reward some points for the users for their popular and authentic content on your website. To judge whether a post deserves reward, you consider certain aspects such as the number of Likes ,Shares & Comments coalesced to “Popularity” ranging between 0 – 10 and “Plagiarism” ranging between 0 – 10, here 10 means completely original and 0 vice-versa.

So, Let’s take a look at some sample posts. The 1st post had a Popularity index of 9 and Plagiarism of 9 and the post was quite popular and gained a lot of attention, hence it was rewarded. Then we have the 2nd post obtaining 3/10 for Popularity and 2/10 in the Plagiarism, this post wasn’t prominent nor was it authentic, and therefore it wasn’t rewarded. Now, we have the 3rd post which got 6/10 for the Popularity and 7/10 for the Plagiarism, Quite Original content and also appreciated by a good number of people. Well, does this post get rewarded??

The solution to this question can be narrowed down easily. We do what we do in most of our algorithms, which is to look at our previous data. The graph above displays the posts that got rewarded and rejected based on our parameters. The Red points correspond to those posts that got denied of a reward and the Blue points correspond to those posts that were rewarded. So, we can infer from this graph that “ Posts which had good Popularity and Originality were more likely to get rewarded and the posts which had a poor score in both were more unlikely to get rewarded ”.

© Kira auf der Heide, Unsplash

Well it seems that this data can be nicely separated by a line and it seems that most posts over the brown line get rewarded and most posts under the line does not get rewarded.

So, this line is going to be our model. The model here makes a couple of mistakes since there are few blue points that are under the line and few red points over the line. Let’s not consider those misclassified points just for the sake of simplicity.

How did we end up with this line that accurately separates the given data, into desired classes ?

Let me involve some math…

Consider the horizontal axis corresponding to the variable Plagiarism as x1 and the vertical axis corresponding to the variable Popularity as x2. So, this boundary line that separate the blue points and the red points is going to have a linear equation.

2x1+ x2 — 18 = 0

What does this mean ??

Consider this as our “ Score ” equation.

The score is 2 x (Plagiarism) + 1 x (Popularity) — 18 . So for any given post if the Score is a Positive number then the post gets rewarded or if the Score is a Negative number the post doesn’t get rewarded.

Score > 0 : Rewarded

Score < 0 : Not Rewarded

IF THE SCORE IS 0, BY CONVENTION THE POST GETS REWARDED, THIS PREDICTION DOES NOT AFFECT THE MODEL MUCH.

In a more general case the equation of our line will be of the form :

( w1x1 + w2x2 + w3x3 + ……….. wnxn ) + b = 0

Wx + b = 0, where b is the Bias term,

W = w1, w2, w3…. wn ; Weights

x = x1, x2, x3…..xn ; Inputs

y = 0 or 1 ; Labels

Prediction : Here the algorithm predicts what our label will be i.e.

y` = 1 if Wx + b > = 0

y` = 0 if Wx + b < 0

The goal of our problem here is to find the boundary line that classifies most of the points above the line as Blue and the points below the line as Red, which is exactly equivalent to predicting the y` as closely as possible to y.

© Júnior Ferreira, Unsplash

Now lets introduce the notion of a Perceptron. It’s just an encoding of our equation into a small graph. The way we build it is, We fit the data and the boundary line inside a node. Then we add smaller nodes to feed in the inputs, which in our case is the Popularity and Plagiarism. Here, lets consider a case where the Popularity is 9 and Plagiarism is also 9. What the perceptron does is, it plots the point (9,9) and checks if the point lies in the positive or in the negative area.

If the point is in the positive area it returns a Yes (1) or if the point is in the negative area it returns a No (0). The Weights (w1, w2, w3….. wn) and the Bias (b) define the linear equation.

The Score of the linear equation of the form Wx + b, based on the given input (9,9) is

2(9) + 9 – 18 = 9 ; Score > = 0

The score here is greater 0. Therefore, We know that the given point lies above the line and the post will get rewarded. The Score lies in the range (- ∞, + ∞). This needs to be converted to either 0 or 1 , i.e. Either the post gets rewarded or the post does not get rewarded.

© Udacity

Here, we can use an implicit function to get the desired output. One such function is the Step function. The step function returns a 1 if the input is either 0 or is a positive number. If the input to this function is a negative number then it returns 0.

© Udacity

So in reality these perceptrons can be seen as combination of several nodes. Where the first node calculates the linear equation on the given set of Inputs and the Weights and the second node applies the Step Function to the result. In the future, we shall come across other step functions.

Now, we finally have all the tools for describing the Perceptron Algorithm.

  • We start with a random equation, which will determine some line (considering 2 features) and 2 regions , the positive and the negative region. We shall move this line to get a better fit.
  • For every misclassified points(x1,x2,x3…. xn) we do the following :

a) If the prediction was 0, which means the point is a Positive point in the Negative region, then we will update the weight as follows

for i = 1,2,3….n

  • Wi` = Wi + α*xi
  • b` = b + α

Here alpha is the learning rate parameter.

b) If the prediction was 1, which means the point is a Negative point in the Positive region, we will update the weight in a similar way except we will subtract instead of addition.

for i = 1,2,3….n

  • Wi` = Wi — α*xi
  • b` = b — α

By carrying out the following steps we can move the line closer to our misclassified point. We repeat these steps until we get no errors or until the point where we have low value for the error.

So, this is the Perceptron Algorithm.

© Brooke Lark, Unsplash

Here, is the link to the next post on Decision Trees.

Just as always I’d like to thank my readers for their time, interest and attention. As I conclude this post on Perceptrons with one of my favorite quote :

I wish it need not have happened in my time,” said Frodo. “So do I,” said Gandalf, “and so do all who live to see such times. But that is not for them to decide. All we have to decide is what to do with the time that is given us.” — J.R.R. Tolkien, The Fellowship of the Ring .

Enjoyed the article? Click & Hold the ❤ below to recommend it to other interested readers!

Data Driven Investor

from confusion to clarity, not insanity

3

3 claps
Harris Mohammed

Written by

Data Science Engineer | Natural language processing

Data Driven Investor

from confusion to clarity, not insanity