# Support Vector Machine-Intuition validate with maths

“Abundant data generally belittles the importance of algorithm”. But we are not always blessed with the abundance. So, we need to have a good knowledge of all the tools and an intuitive sense for their applicability. This post aims at explaining one such tool, Support Vector Machine.

Support vector machines (SVM) are a set of supervised learning methods; used for regression, classification. Unlike other learning methods; SVM tries to fit the best decision boundary or hyperplane using some data samples from given training data which are called support vectors. Hyperplane means a flat affine subspace of the dimension p-1 in p dimensional space; in machine learning lingo p are the features.

If a dataset is linearly separable then there are infinitely many separable hyperplanes. Now the question that comes to mind is that what is the best separating hyperplane? Other supervised learning methods try to solve it in their respective way; SVM tells that the best hyperplane is one that maximizes the distance to the closest data points from both classes. We say that a hyperplane with a maximum margin.

What is the margin?

Margin is the distance from the hyperplane to the closest data points in either class. And in SVM we opt to maximize this margin.

Why does the maximum margin best?

So intuitive reason behind it is that model with widening the width of the margin would be more generalizable on unseen data. Because points near the decision surface represent very uncertain classification decisions; there is almost a 50% chance of the classifier deciding either way. A classifier with a large margin makes no low certainty classification decisions. This gives you a classification safety margin: a slight error in measurement will not cause a misclassification.

Another intuition motivating SVM by construction, an SVM classifier insists on a large margin around the decision boundary. Compared to a decision hyperplane, if you have to place a fat separator between classes, you have fewer choices of where it can be put. As a result of this, the memory capacity of the model has been decreased, and hence we expect that its ability to correctly generalize to test data is increased.

To find the optimal hyperplane with maximum margin; some of the training data points which helps to find it are called support vectors. Support vectors determine the shape of the hyperplane. Other than support vectors does not play any role to decide optimal decision hyperplane.That’s it this is the intuitive explanation of SVM.

Let’s try to formalize it mathematically.Equation of hyperplane defined through, as a set of points such that

To find the margin from points to the hyperplane; we project each training points on the plane and find the closest data points out of them to maximize the margin. For example, if we take point A (i.e. A is a vector in space shown by point) and project it on the hyperplane at a point. Now the distance between vector A and B is given by the law of vector addition as

as vector b lies on the plane then it satisfies the equation of the plane. So,

As we can see in the above figure, γ ⃗is parallel to the W, so γ becomes,

now placing the value for vector b in equation(2); then equation(2) becomes,

solving for α, we get,

As we need to calculate the value of γ, we can use the Euclidean distance formula so the value of γ as

After simplifying the above equation, we get,

We can write a more generalised equation as,

In SVM, we try to find maximum values of W’s and b’s such that which minimize this margin(γ) value.

Since hyperplane is scale-invariant, then we can scale W and b for our convenience such that,

Then, the objective function becomes;

After doing some manipulation in the second constraints we can finalize the objective function as,