Why the line move closer to the misclassified point in perceptron learning algorithm
cite: the content of the article is inspired by the lectures of Prof. Mitesh M. Khapra on Deep Learning from NPTEL. And Intro to Deep Learning course on Udacity.
It is obvious for the people who are kick-starting a course on Deep Learning, may feel uncomfortable in the first phase while learning Perceptron Learning Algorithm.
The major question is “why does the line move closer to the misclassified point on applying the perceptron learning algorithm?”
A video on Perceptron Learning algorithm from “Intro to Deep Learning” course on Udacity.
First, let us consider the perceptron learning algorithm
a slight change to the above notation
Let W=(w1,w2,...wn)
point X = (x1,x2,...xn)let us consider w0=b
and x0=1then W=(w0,w1,..wn)
X=(1,x1,x2,..xn)1. Start with random weights to W
2. For every misclassified point X:
if prediction = 0:
W = W+X
else if prediction = 1:
W = W-x
on considering the W = W+X where it is equal
W = w0x0+w1x1+w2x2+…..+wnxn
It is nothing but dot product (W ⋅ X)
that is
prediction =0 if (W ⋅ X)<0
prediction =1 if (W ⋅ X) ≥ 0
Working of the above algorithm
We can understand from the above algorithm that
prediction = 1 when (W ⋅ X) = 0 and every point X on the line satisfy the equation (W ⋅ X) = 0 . (You can refer to the three blue points on the line in above image )
Now, let us do some math
Let’s talk about the angle (say α ) between W and any point (X) which lies on the line
as Dot product is 0
as cos(α) =0 , it means α=90° ( ∵ cos 90° = 0)
it means the angle between vectors W and X is 90°
so, vector W is perpendicular to every point on the line it is actually perpendicular to the line itself.
Observing the output
Let us consider the model shown below correctly predict each and every point.
let us consider 4 input points (X) say A, B, C, D in the vector space.
A, B are the points which are having prediction value 1.
C, D are the points which are having prediction value 0.
Now consider say α be the angle between point A and W, that is ∠AOW = α
here in case of A: angle α < 90°
considering the same scenario for every other point
in case of A: angle α < 90°
in case of B: angle α < 90°
in case of C: angle α > 90°
in case of D: angle α > 90°
(MOST IMPORTANT) It is obvious from the figure that for every point :
if it is above the line i.e. prediction = 1 then angle between W and X is less than 90° (i.e α < 90°).
if it is below the line i.e. prediction = 0 then angle between W and X is greater than 90° (i.e α >90°).
Operations of Perceptron Learning Algorithm
on making a note of the outcomes of the above story, Let’s understand the actions performed W vector when a point X is misclassified.
For every misclassified point X:
if prediction = 0:
W(new) = W+X
Consider X whose y=1(actual value) and prediction =0 :
the angle between X and W is greater than 90° (but we required α to be less than 90°)
let us consider what happens to angle α when we make W(new) = W+X is
it is obvious from the above formula that cos(α) is directly proportional to the dot product of W and X
Now, carefully observe the following steps in the image shown below.
As you can see that cos(αnew) > cos(α) which means the value cos(αnew) got increased
We know that as α increases, the value of cos(α) decreases
and similarly, as α decreases, the value of cos(α) increases
So it is obvious from the above derivations that after the operation
W(new) = W+X
the angle between W and X decrease, this, in turn, make the angle α < 90° (may not be in one step)
Hence, the newly fitted line will move closer to the misclassified point on reducing the angle between W and X.
NOTE: you can similarly derive the proof for the second step in the algorithm
Conclusion:
I hope the explanation may help many of the people to come out of the mystery in Perceptron Learning Algorithm.
for more such exciting stories do follow me
Thank you