Perceptron Convergence Theorem
This article assumes that you have a basic understanding of the Perceptron Learning Algorithm, as it will only discuss its convergence.
Theorem : “For any finite set of linearly separable labeled examples, the Perceptron Learning Algorithm (PLA) will terminate after a finite number of iterations.”
Notations :
We denote w* the final weights vector that’s perfectly separates our dataset :
Such that our dataset S is defined as follows (P is an integer greater than 1) :
We will count a learning step n only those in which the weights are updated
Before proceeding further with the proof, let’s define a few parameters. :
- The angle between the vector Wn and the vector W* :
- The Euclidean distance of the point xµ to the plane w*, which will refer to as the margin :
We express the margins as follow :
we define also the minimum distance between a point x and the hyperplan of vector w*:
Proof:
The intution here is to try to upper bound c(n) and use the fact that the function cos is less or equal to 1
For a given step n, and since there been an update, the norm of the weight vector w is given as follows :
Let’s define R as the raduis of a closed ball with center origine of our axis containing all of our vector xµ:
And since n is an update step we have :
so we have :
We repeat the process n times ( knowing that the initial vector w0 is the null vector ) to get :
This is will be the first part of our proof which was dedicated for the denominator of c(n), now we will investigate its nominator.
The step n is an update step so :
The expression of the margin delta gives :
So :
We repeat the process n times ( knowing that the initial vector w0 is the null vector ) to get :
From (1) and (2) and the expression of c(n) we have :
We could simply use the cauchy schwartz inegality here, but the cos is giving more intuition about how the angle between wn and w* is increasing with root n
Finally :