Everything you need to know about Support Vector Machine
Machine learning is simple to learn & use. Most importantly we should know which method we will apply to a particular type of problem. There are many types of Regression & Classification method. Linear regression is not always going to help us. When the data is complex & non linear, we took help of Support Vector Machine. Before we deep dive into the topic, here are the objectives of learning SVM:
- To understand the theory & intuition of SVM & other terms related to it
- To know the math behind the SVM
- To implement the code in python
Theory & intuition : It is a supervised machine learning algorithm, where we get the labeled dataset. It is used for regression & classification task both. We draw a marginal line along the graph ( separating line for classification task & best fit line for regression task). The main purpose of using SVC is that sometimes our dataset is non linear & more complex.
Max marginal classifier (MMC):
Assumptions: 1. The data should be linearly separable. 2. In this technique we draw a hard margin ( which means we do not allow any misclassification across the border line). 3. It may be in 1D or 2D area.
The model perform very well in training data but performs poor in test dataset, which will lead us to overfitted model.
We can see from the above graph the margin is maximized. The separating line is called hyperplane. MMC model will choose the hyperplane & parallel to that the margins are created. Maximum distance between two margins are called Maximum Margin.
The formula of MMC is y( β0 + Σβi*Xi ) ≥ M
Where y = dependent variable , β0 = intercept, Xi= independent variable , βi = coefficient of those independent variable, M = smallest hyperplane margin, value of y( β0 + Σβi*Xi ) can not go below M.
Support Vector classifier(SVC) :
Assumptions : In this technique we draw a soft margin ( which means we allows some misclassification across the border line) in 2D area because we want to add a error term it will not lead to overfitting. It is less sensitive for all individual data points.
The formula of SVC is y( β0 + Σβi*Xi ) ≥ M(1-ϵ) , Σβ²=1, 0≤Σϵ≤C,
Where y = dependent variable , β0 = intercept, Xi= independent variable , βi = coefficient of those independent variable, M = smallest hyperplane margin, ϵ = the error we have allowed to make some amount of misclassification, C = maximum value of summation of all errors(ϵ). Strength of regularization is inversely proportional to the parameter C. In order to find the maximum margin we have to maximize the margin between data points & the hyperplane. But if the data points are not linear we can not draw the line. This problem solved by Support Vector Machine.
Support Vector Machine (SVM) : When hyperplane classifies the data in non linear approach we call it SVM. The model takes lower dimension input space to transform it into higher dimensional so that the data becomes linearly separable in higher dimensions
In above image in one example we can see the 1D data points have been transformed into 2D data points & in another example 2D data points transformed into 3D data points. In both cases they are separated by a hyperplane. This projection is called ‘kernel’ projection. There are three types of kernel such as ‘poly’, ‘rbf’, ‘sigmoid. These kernels are required only when the dataset is non linear. If the dataset is linear we can go for linear kernel, but also we do not even require SVM, we can use SVC only.
Polynomial Kernel : Transforming the lower dimension into higher dimension is done by adding squared terms of parameters. If there are A, B the parameters we will include all the terms of 0,A,B,AB,A²,B² into the equation.
RBF Kernel : It is most widely used kernel. The formula of RBF kernel is following :
The parameters are the distance & γ.
- If Euclidean distance between X1 & X2 tends to zero, K will be close to 1 .
- γ is inversely proportional to σ. If σ increases, the region of similarity increases. That means K will be more close to 1 & the points will be considered dissimilar. To find out the proper value of σ can be done by grid search cv.
Sigmoid Kernel : It comes with neural network concepts. In ANN the sigmoid function used as an activation function. In SVM using the sigmoid kernel is equal to two-layer, perceptron neural network 3.
The formula of SVM is y( β0 + Σβi*Xi +Σβij*Xi²) ≥ M(1-ϵ) , Σβ²=1, 0≤Σϵ≤C, Where y = dependent variable , β0 = intercept, Xi= independent variable , βi = coefficient of those independent variable, M = smallest hyperplane margin, ϵ = the error we have allowed to make some amount of misclassification, C = maximum value of summation of all errors(ϵ).
Here extra higher dimensional term is be added due to kernel projection.
So the parameters of SVM to be tuned are hereby as following :
- Kernel ( to increase the model accuracy)
- C ( to add the penalty term to regularize the error term, If C is large the model tends to overfit & if C is small the models tends to underfit)
- Degree
- Gamma (it should not be very high)
- Random State
Now we can go for python implementation.
#....Importing the library
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
#....reading the data
df = pd.read_csv("../DATA/mouse_viral_study.csv")
df.head()
#....Visulizing the dataset
sns.scatterplot(x='Med_1_mL',y='Med_2_mL',hue='Virus Present',data=df,palette='seismic')
#....Importing the model library
from sklearn.svm import SVC
#....Defining the independent & dependent variable
y = df['Virus Present']
X = df.drop('Virus Present',axis=1)
We are generalizing a function which will helps us to visualize the hyperplane in 3D space. We will just input the kernels one by one & visualize the the hyperplane.
def plot_svm_boundary(model,X,y):
X = X.values
y = y.values
# Scatter Plot
plt.scatter(X[:, 0], X[:, 1], c=y, s=30,cmap='seismic')
# plot the decision function
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
# create grid to evaluate model
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = model.decision_function(xy).reshape(XX.shape)
# plot decision boundary and margins
ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5,
linestyles=['--', '-', '--'])
# plot support vectors
ax.scatter(model.support_vectors_[:, 0], model.support_vectors_[:, 1], s=100,
linewidth=1, facecolors='none', edgecolors='k')
plt.show()
C = 1.0
#....defining the model with selected parameters
lin_svc = SVC(kernel='linear', C=C).fit(X, y)
rbf_svc = SVC(kernel='rbf', gamma=0.7, C=C).fit(X, y)
poly_svc = SVC(kernel='poly', degree=3, C=C).fit(X, y)
sigmoid = SVC(kernel='sigmoid',C=C).fit(X, y)
#....visualizing all the kernels
plt.subplot(2,2,1)
plot_svm_boundary(lin_svc,X,y)
plt.subplot(2,2,2)
plot_svm_boundary(rbf_svc,X,y)
plt.subplot(2,2,3)
plot_svm_boundary(poly_svc,X,y)
plt.subplot(2,2,4)
plot_svm_boundary(sigmoid,X,y)