Introduction to Support Vector Machines
Support Vector Machines are the most commonly used supervised learning algorithms for classification problems. This post discusses the mathematics/intuition behind Linear SVM algorithms with a light introduction to Lagrange multipliers which is used for the optimization part.
Linear SVM:
In a 2-D space there are 2 sets of points represented by their coordinates (x1, x2) and their labels represented by y with values 1 or -1.
The objective of this problem is to find a hyperplane s.t it perfectly classifies the two sets of points as shown below. A hyperplane in this case is a straight line that partitions the two sets of points.
Which means, any point lying on either side of this hyperplane satisfies the condition
Clearly, there can be several such hyperplanes that can separate the two classes. But how do we choose the right hyperplane?
First lets the understand the concept of geometric margin:
Therefore,
In order to find the right hyperplane we must maximize the geometric margin as shown in the figure below
The dashed lines represent the marginal hyperplanes and the closest points to the marginal hyperplanes are +1/-1. These points are the support vectors. For example, in the figure below the support vectors(points closest to the marginal hyperplanes) are colored in green and bright red.
If the position of these points were to be changed, then the position of the optimized deccision boundary would be effected. Hence, the points closest to the decision boundary determine the positioning of the decision boundary.
Optimizing Linear SVM:
In finding the optimal solution for the hyperplane we previously established that geometric margin must be maximized.
is equivalent to
subject to:
We can use the Lagrangian multiplier method to solve this optimization problem.
As we can see, the weights represented by w is a linear combination of training vectors x1…xm iff
Furthermore, if the above condition is satisfied, then
Therefore, all the support vectors lie on the marginal hyperplanes.
When solving this problem on python, we start by trying to find hyperplanes close to the support vectors(x1…xms that satisfy the above equation) and then optimize for maximum margin by incrementally changing the value of w to find the optimal hyperplane. The result would look something like this. The dashed lines represent less optimal hyperplanes.