Support Vector Machine — Explained
This blog will cover three questions:
- What is the support vector machine?
- How does it work in linear separable scenarios?
- Implementation in python sklearn
1. What is Support Vector Machine (Classifier)
Before we dive in, I would like to focus on the three lines shown in the chart below. All of them separate the red and green dots, in other words, they are all correctly classify those 40 points. But which line do you like, if you are asked to pick?
Most probably you will pick up Blue Line, even though you don’t know what’s Support Vector Machine. What Blue Line does better is that it stays away from those closest points as far as possible in comparison to the other two lines.
That’s what support vector machine does!
Support Vector Machine finds a decision boundary to separate different classes by maximizing the margin.
The margin is the (perpendicular) distance between the line and those dots closest to the line.
2. How does it work in linear separable cases?
Let’s use the same example above. Obviously, lines exist to separate these red and green dots. if that’s the case, we just need to find those lines and pick up the one that maximizes the margin.
Instead of finding those infinite lines that separate red and green dots, which is impossible🙄, we are going to generalize those lines in a mathematical way. Two concepts we need to understand in order for us to generalize those lines: Hyperplane and Separate Hyperplane
What is a Hyperplane? (or you can think of this as linear decision boundary)
In an n-dimensional space, its Hyperplane is an (n minus 1)-dimensional subspace. For a 2-dimension space, its Hyperplane will be two minus one, 1-dimension, which is just a line. For a 3-dimension space, its Hyperplane will be three minus two, 2-dimension, which a plane that slice the cubic. Okay, you got the idea.
Any Hyperplane can be written mathematically as followed:
For a 2-dimensional space, the Hyperplane, which is the line, could be written as:
The dots above this line can be written as:
The dots below this line can be written as:
What is a Separating Hyperplane?
For the red and green dot example shown above, assume the label y is either 1 (green) or -1 (red). All these three lines below are separating hyperplanes because they all share the same property — above the line, is green; below the line, is red.
This property can be written in math again as followed:
If we further generalize these two into one, it becomes:
The formula above is a very important constraint for SVM that represents all the data points are correctly classified. In the perfect scenario — linear separable scenario, this constraint can be met by SVM. But in the less perfect scenario, we will need to loosen this constraint, or we will not get any decision boundary, which will be covered in the Part2.
After we are able to generalize the line in a mathematical way, let’s understand what defines the most optimal line.
If we can find one separating Hyperplane, then we will be able to find infinite lines existing to separate those points, since we can shift, rotate the line in a million ways. Here the Margin comes to play.
So what is margin?
- Let’s say we have a Hyperplane — line X
- calculate the perpendicular distance from all those 40 dots to line X, it will be 40 different distances
- Out of the 40, the smallest distance, that’s our margin!
The distance between either side of the dashed line to the solid line is the margin. We can think of this optimal line as the mid-line of the widest stretching we can possibly have between red and green dots.
To sum up, SVM in the linear separable cases:
- Constrain/ensure that each observation is on the correct side of the Hyperplane
- Pick up the optimal line so that the distance from those closest dots to the Hyperplane, so-called margin, is maximized
3. Implementation in python sklearn
# Fit the model
# Here the X stands for the (x1,x2) for all 40 points we shown above
#Tthe Y is 1 or -1. 1 stands for green and -1 stands for red.from sklearn import svm
clf = svm.SVC(kernel='linear')
clf.fit(X, Y)# predict a new points
print('Prediction:',clf.predict([[-2,-5]]))
OUTPUT: Prediction: [-1]# the coefficient of the line
print('the coefficient of the line', clf.coef_)
OUTPUT: the coefficient of the line [[0.90230696 0.64821811]]
In the support vector machine Part2, we will go further into how support vector machine works in scenarios that classes are non linearly separable.