Finding Non-Linear Decision Boundary in SVM

6 min readOct 24, 2019

The Non-Linear Decision Boundary

In the previous, SVM article we can clearly see the decision boundary is linear.
SVM works well when the data points are linearly separable.
If the decision boundary is non-linear then SVM may struggle to classify.
Observe the below examples, the classes are not linearly separable.
SVM has no direct theory to set the non-liner decision boundary models.

Mapping to Higher Dimensional Space

What we will do in case of Logistic Regression to find Non-Linear boundaries, Let say we have 3 features(X1, X2, X3) and we are trying to apply Logistic Regression on this what we decided is let’s add few more features(X!², X2², X3³, …) to get to Non-Linear decision boundaries, now that’s absolutely ok, the only issue is how do we choose which features to add means which feature will make sense and which features do we should add, so that’s a very complicated thing to figure out what features will work because if we know the decision boundary we can decide which features to add but obviously we don’t know the decision boundary that’s why we are trying to find the right algorithm. Let’s look at another way of trying to get to Non-Linear features. What we can do is we can choose some landmark points(see the pic below) now if we give some data x we can add some features and those features can be let say f1 will be the similarity of x with l1 or f1 →Sim(x, l1), so we are adding a feature which is how similar x is to l1 we can also add a feature f2 which is how similar x compare to l2 or f2 →Sim(x, l3) and f3→Sim(x, f3), so if we add let say thousand landmark points we will have thousand new features, so that’s basically one way of adding new features,

in case of SVM we actually do something very similar, we choose some landmark point we choose a similarity function and then if we had let say n features originally we take it to some other dimension let say n`(n →n`) a new dimension and this new dimension basically represent n` landmark point and we need to decide the similarity function as well, so we don’t keep the original features we take only new features and this new features come by first choosing the landmarks and then finding the similarity between the landmark and the given point, So two obvious questions are-

How exactly do we choose a landmark point?
And what are the options for similarity functions that we can use?

Choosing Landmark Points

In case of SVM what we actually do is, let say our original data is (m*n) it basically means we have m data points and each has n features, so in case of SVM what we are going to do is, we are going to make all this m training data points to be our landmark points, so if we were in x→Rⁿ we will find similarity of x with each of the training data points that will give us a vector where this(vector) will be the similarity between the 0th training data point and x, 1st training datapoint and x, 2nd training data point and x and the last training data point and x so basically we have n features and we are moving it to m features(Rⁿ→Rᵐ), so the new feature set will have m data points but each will have m features as well, so we are moving to m features for each data point, that happens for testing as well as training data, the training data points are considered to be landmarks.

Various Similarity Functions

So to move our original feature dataset to a higher dimensional dataset what we are going to do is we are going to choose the landmark points and to convert the x(let say we have x in the original dimension which is two-dimension )to a higher dimension what are we going to do is feature 1 or you can say ith feature is going to be the similarity between x and li (fi =Sim(x, li)) so that’s how we are going to go to next dimension, so that’s how we are going to find all are new features. We already know that m*n →xi→Rⁿ(we have n real numbers representing xi) and what we are going to do is xi`→Rᵐ(similarity of xi with all training data points) that’s how we are going to get to a higher dimension we are assuming m>n that is we have more training data then each number of features that each data point has. If that’s not the case in most of the cases we will not be going to use kernel trick, we will not change the dimension is case m>n not true. So now we have to find out what are the options we have for the similarity functions, so we want to find the similarity function that is basically Sim(x, li)-

Linear Kernel — Actually means that we are going to stay in the same dimensionality, we are not going to move to another dimensionality if we are in Rⁿ we will remain in Rⁿ(Rⁿ→ Rⁿ) we will choose linear kernel in two cases -

— — When m<n so we don’t want to move to a new dimension if the new dimension is less than the current dimension.

— —And if m is huge because time complexity will be a lot.

Gaussian Kernel — So Gaussian Kernel say is that similarity between x and li is equal to e^(-||x-li||²/2 σ²) where σ is a parameter that we can control, so from the formula we can see that if x =li, it will become 1 if x and li are far it will become 0. So Gaussian Kernel gives a value between 0 and 1. We also called it RBF kernel as well. In Gaussian, Kernel 1/2σ² is also known as Gamma and ||x-li||² is nothing but distance. One thing to remember that if σ² too high it leads to overfitting if too low it leads to underfitting. Click this link (Gaussian Kernel) to know more about Gaussian Kernel.
Polynomial Kernel — So poly(x, li)= (x^T li +a)ᵇ, we can choose the value of a and b.

To read more about different Kernels in SVM click this link- SVM Kernels Functions

N.B-

Most of the time we will use either Linear Kernel or Gaussian Kernel

Coding SVM with Non-Linear Decision Boundary

#importing libraries
import numpy as np 
import matplotlib.pyplot as plt 
import pandas as pd#importing datasets
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"  
# Assign colum names to the dataset colnames = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']  
# Read dataset to pandas dataframe 
irisdata = pd.read_csv(url, names=colnames)#preprocessing
X = irisdata.drop('Class', axis=1) 
y = irisdata['Class']#train test split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)#training the algorithm
from sklearn.svm import SVC 
svclassifier = SVC(kernel='rbf') 
svclassifier.fit(X_train, y_train)#predictions and evaluating
y_pred = svclassifier.predict(X_test)from sklearn.metrics import classification_report, confusion_matrix print(confusion_matrix(y_test, y_pred)) print(classification_report(y_test, y_pred))