Why the line move closer to the misclassified point in perceptron learning algorithm

Sanmitra Dharmavarapu
4 min readNov 13, 2018

cite: the content of the article is inspired by the lectures of Prof. Mitesh M. Khapra on Deep Learning from NPTEL. And Intro to Deep Learning course on Udacity.

It is obvious for the people who are kick-starting a course on Deep Learning, may feel uncomfortable in the first phase while learning Perceptron Learning Algorithm.

The major question is “why does the line move closer to the misclassified point on applying the perceptron learning algorithm?”

A video on Perceptron Learning algorithm from “Intro to Deep Learning” course on Udacity.

First, let us consider the perceptron learning algorithm

a slight change to the above notation
Let W=(w1,w2,...wn)
point X = (x1,x2,...xn)
let us consider w0=b
and x0=1
then W=(w0,w1,..wn)
X=(1,x1,x2,..xn)
1. Start with random weights to W
2. For every misclassified point X:
if prediction = 0:
W = W+X
else if prediction = 1:
W = W-x

on considering the W = W+X where it is equal

W = w0x0+w1x1+w2x2+…..+wnxn

It is nothing but dot product (W ⋅ X)

that is

prediction =0 if (W ⋅ X)<0

prediction =1 if (W ⋅ X) ≥ 0

Working of the above algorithm

RESULT OF Perceptron Learning Algorithm: line separating the vector space into two halves. Image by Sanmitra

We can understand from the above algorithm that

prediction = 1 when (W ⋅ X) = 0 and every point X on the line satisfy the equation (W ⋅ X) = 0 . (You can refer to the three blue points on the line in above image )

Now, let us do some math

Let’s talk about the angle (say α ) between W and any point (X) which lies on the line

as Dot product is 0

an equation to find the angle between two vectors

as cos(α) =0 , it means α=90° ( ∵ cos 90° = 0)

it means the angle between vectors W and X is 90°

showing the perpendicularity of vector W with every point on the line. Image by Sanmitra

so, vector W is perpendicular to every point on the line it is actually perpendicular to the line itself.

Observing the output

Let us consider the model shown below correctly predict each and every point.

let us consider 4 input points (X) say A, B, C, D in the vector space.

A, B are the points which are having prediction value 1.

C, D are the points which are having prediction value 0.

Predictions after the training of Perceptron. image by Sanmitra

Now consider say α be the angle between point A and W, that is ∠AOW = α

here in case of A: angle α < 90°

considering the same scenario for every other point

in case of A: angle α < 90°

in case of B: angle α < 90°

in case of C: angle α > 90°

in case of D: angle α > 90°

(MOST IMPORTANT) It is obvious from the figure that for every point :

if it is above the line i.e. prediction = 1 then angle between W and X is less than 90° (i.e α < 90°).

if it is below the line i.e. prediction = 0 then angle between W and X is greater than 90° (i.e α >90°).

Operations of Perceptron Learning Algorithm

on making a note of the outcomes of the above story, Let’s understand the actions performed W vector when a point X is misclassified.

For every misclassified point X:
if prediction = 0:
W(new) = W+X

Consider X whose y=1(actual value) and prediction =0 :

the angle between X and W is greater than 90° (but we required α to be less than 90°)

let us consider what happens to angle α when we make W(new) = W+X is

the formula to find the angle between W and X.

it is obvious from the above formula that cos(α) is directly proportional to the dot product of W and X

Now, carefully observe the following steps in the image shown below.

proof for cos(αnew) > cos(α)

As you can see that cos(αnew) > cos(α) which means the value cos(αnew) got increased

We know that as α increases, the value of cos(α) decreases

and similarly, as α decreases, the value of cos(α) increases

So it is obvious from the above derivations that after the operation

W(new) = W+X

the angle between W and X decrease, this, in turn, make the angle α < 90° (may not be in one step)

Hence, the newly fitted line will move closer to the misclassified point on reducing the angle between W and X.

NOTE: you can similarly derive the proof for the second step in the algorithm

Conclusion:

I hope the explanation may help many of the people to come out of the mystery in Perceptron Learning Algorithm.

for more such exciting stories do follow me

Thank you

--

--

Sanmitra Dharmavarapu

Facebook PyTorch Scholarship recipient | Google India Challenge Scholarship winner | Deep Learning Enthusiast