Support Vector Machines Deep Intuition PART-I(Basic Intuition)
Before diving into Support Vector Machines, you need to know what is Classification and Regression. For knowing them, got through this.
Here, we are not discussing any math, we’ll discuss it in PART-2 and PART-3 of this post. Don’t stop if you didn’t get any concept, read twice to understand as I’m explained everything in a simple manner.
There are so many similarities between Logistic regression and SVMs. So, if you don't know about logistic regression. go through logistic regression first.
Support Vector Machines are used to solve both classification and regression problems.
SVMs are also used in Outlier Detection.
Note that, SVMs(Support Vectors Machines) is good for handling Complex but small and medium training datasets.
SVMs are designed to work only with binary data. But we can make it a Multi-class classification with OneVsOne and oneVsRest Strategies.
So now, before getting into details, you need to know something.
- Linear Separable Vs Linearly non-separable Data:- Linearly separable data is the data that can be divided easily as shown in the figure.
let’s consider, your training data consists of 2 columns, and output variables are of 2 classes. in the above image, the x-axis represents a column and the y-axis represents another column, and when we plot the points in a 2-d plane, the + sign data points represents a class, and circled points represents another class in data labels. As we can see, the data is separable by a thick line(for now don't care about dotted lines which are parallel to the thick line). But, real-world data will be not like this. lemme show you another type of data.
Normally, the above data resembles real-time data.
In the above picture, the green and blue balls represent 2 classes of class labels. the above data is not linearly separable data as data is not separable with a straight line or a curved straight line precisely as like in linearly separable data.
Now, I'll say some things to keep in mind, the terminology we can say in a much better way.
lemme show you a picture,
In the above diagram, the thick straight line is called Hyper Plane, and the parallel lines are called marginal lines, and data points(red and blue balls represents 2 classes of output labels) which are touching the marginal lines are called Support Vectors, the distance between the 2 marginal lines are called marginal distance.
You may ask what is the use of knowing this, these are the components of Support Vector Machines that we’ll discuss now deeply. We’ll go through everything, don’t worry about that.
In the above picture, if the hyperplane is formed without having those marginal lines, then it’s simply logistic regression. Without marginal lines, hyperplane will be selected as same as in logistic regression(which is close to all points). But here, we’ll have marginal lines for purpose, which makes SVMs unique.
Now, let’s understand how Hyperplane get selected from many hyperplanes, and how marginal lines get drawn & its purpose:-
Assuming that data we have is linearly separable data.
- Let’s think you just created a hyperplane without considering the margins.
- Then the hyperplane will get oriented in such a way that it is close to some of the points in our training dataset as like in linear & logistic regression.
- Then, there is a chance that the test data (or) new data will lie on the wrong side of the hyperplane, even if the new points lie close to training examples of the correct class.
- But, when we create a hyperplane in such a way that the data points are far, then even if new data is a little closer to the wrong class than the training points, it will still lie on the correct side of the hyperplane (Approximately).
- In order to select a hyperplane that is far from data points what we’ll do is we’ll select the nearest points(which will get turned as support vectors) to the hyperplanes, then we’ll have a projection of the nearest points(support vectors) on all hyperplanes, which hyperplane will give the maximum sum of distances(perpendicular distances) from all support vectors, that hyperplane will be selected as the final hyperplane.
- Now, marginal lines will be drawn through the support vectors and parallel to the selected hyperplane.
- Note that, there are many hyperplanes that will give good accuracy, but we need to select a hyperplane in such a way that the distance between support vectors and hyperplane is maximum.
For now, you have some basic knowledge about why we have marginal lines, how hyperplane will get selected, and how marginal lines will be drawn, what is support vectors and things.
let’s talk about non-linear data. inorder to deal with non-linear data, SVM kernels are used.
In a simple way, SVM kernels will make us look at the data from the lower dimension to the higher dimension, in such a way that the data looks like linear separable data, and we can be able to get a hyperplane.
There are different types of SVM kernels like.,
- Polynomial kernels.
- RBF(radial basis function) Kernels.
- sigmoid kernels.
and many others…
This is the end.
Now, you can confidently say that “I have a complete basic intuition about Support vector machines”. and of course this is not the end for SVMs, whatever we discussed above is the basic knowledge of SVMs in order to understand the math behind every concept we discussed.
Don’t worry, if you didn’t get any concept clearly, in PART-2 and PART-3 I’ll discuss the complete math of SVMs where your concepts become crystal clear.
Thank you & Happy Learning✌.