Member-only story
Road to SVM: Maximal Margin Classifier and Support Vector Classifier
Support Vector Machine is a popular Machine Learning algorithm used in classification tasks, especially for its adaptability to non-linearly separable data (thanks to the so-called Kernel trick). However, before getting to what we use today, several models with the same underlying structure were developed. In this article, I’m going to give you the intuition behind two of them, whose progressive implementation leads to modern SVM.
Those are the Maximal Margin Classifier and Support Vector Classifier. However, before dive into those, let’s first introduce the main object of all of this classifier, which is the separating hyperplane.
Getting familiar with hyperplanes
A hyperplane defined in an h-dimensional space is an object of dimension h-1 which separates the space into two halves. I’ve been talking about the mathematical interpretation of hyperplanes in my former article. To recap, in a generic h-dimensional space, we define a generic hyperplane as:
Now, each vector x* of variables can lie either on the hyperplane if:
Or on one of the two halves if:
For simplicity, let’s consider the following bivariate case:
It follows pretty naturally henceforth to set a decision rule that classifies one observation depending on the side of the hyperplane it lies on. Namely, imagine we have drawn the best hyperplane (we will see later on what ‘best’ means) on our train set, where our target is a binary -1,1. Now, a new observation x* occurs (in green) and we want to classify it. So the only thing we need to do is substituting its values in the expression of our hyperplane: if it is positive, we label it with 1, otherwise with a -1.
So in this case, our decision boundary told us that x* has label 1.