Support Vector Machines: SVM
Complete guide from scratch
It is better if you first read about Linear Regression and Logistic Regression. If not please go and explore them (You can refer to following links for help).
Support Vector Machines is a very powerful classifier which can work both on linearly and non-linearly separable data. It can be used for regression as well as classification problems, yet primarily used for Classification problems.
There are many advantages associated to SVMs:
- Effective in high dimensional spaces.
- Still effective in cases where number of dimensions is greater than the number of samples.
- Uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.
- It is versatile as different Kernel Functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels.
Main goal of SVM is to find n optimal hyperplane, that best separates our data so that the distance from nearest points in space to itself(also called margin) is maximized. These nearest points are called Support Vectors
What is Hyperplane ?
A hyperplane is plane of n-1 dimensions in n dimensional feature space, that separates the two classes. For a 2-D feature space, it would be a line and for a 3-D Feature space it would be plane and so on.
Support vectors are data points that are closer to the hyperplane and influence the position and orientation of the hyperplane. Using these support vectors, we maximize the margin of the classifier.
Maximum Margin Hyperplane
An optimal hyperplane best separates our data so that the distance/margin from nearest points(called Support Vectors) in space to itself is maximized.
A hyperplane is able to separate classes if for all points -
w x + b > 0
(For data points in class 1)
w x + b < 0
(For data points in class 0)
Provided equation of hyperplane is w x + b = 0 where w is a vector and b is intercept.
We also want our predictions to be confident implying that more the distance between point and plane more it is confident as a slight in change in value will not be able change the class of a confident point.
Handling Outliers in SVM
We will allow our algorithm to do some error on training examples. For each error there will be some cost involved which will be added in our function.
Mathematics Behind SVM
Key Idea is to maximize margin that is maximize of distance of the point with minimum distance.
Equation Of Hyperplane
As we can see the distance between two margin is 2/||w|| (L2 norm of w) which has to be maximized. Hence ||w||/2 has to be minimised. For the sake of easier calculations we take square of our norm.
As we discussed earlier there is cost(loss) involved which is called hinge loss.
Similarly for bias term → b = b -C X xi when yi * (w.xi + b) <1 while b does not get updated when this value is ≥1.
Using this loss we get our final weight and bias using learning rate and gradient descent.
w =w- lr*loss
bias = bias-lr*loss
You can understand it by looking at code from the following GitHub repository.