An Anatomy of the Support Vector Machines.

Published in

CodeX

4 min readMay 21, 2022

Introduction

Support Vector Machine (SVM) is a supervised machine learning algorithm that is equally powerful and versatile. It has the ability to carry out linear or nonlinear classification, and regression, and also it can perform tasks like outlier detection as well. It is one of the most widely known models in machine learning and anyone enthusiastic about machine learning must learn this model. It is well suited for the classification of complex but small or medium-sized datasets.

Explanation

Let’s discuss Linear SVM classification more vividly. It is convenient to explain any concept with the help of a picture. So take a look at the above picture, it shows some data points plotted on the graph. The data points are separated by a red line which is known as the ‘decision boundary’. This straight line clearly separates the data points into two groups or classes which shows that the data points are linearly separable. This simply means a straight line through the points can separate them into 2 distinct classes and thus makes a great linear classifier. This straight-line not only separates the two classes but also stays as far away from the closest training data points as possible.

You can think of an SVM classifier as fitting the widest possible street where a street is represented by the red line in the middle, between the classes. This is known as large margin classification. When we add more training instances “off the street”, it will not affect the decision boundary at all as it is fully determined or “supported” by the examples of data points located on the edge of the street. These instances are called the support vectors (circled in red).

If we strictly provide a rule that all example data points must be away from the red decision boundary line and on the right side, this is known as hard margin classification. Hard margin classification arises issues as it only works well for linearly separable data points and it is also very sensitive to outliers and may not generalize well in some instances.

Therefore, it is more advisable to use a model that is less rigid. Our main goal here is to find a good balance by keeping the decision boundary as large as possible and also to limit the margin violations as well. To limit margin violations refers to the data points which end up in the middle of the decision boundary or fall on the wrong side. The classification follows the above condition by allowing some fixed number of margin violations and has a bit more flexibility is known as soft margin classification.

In scikit-learn’s SVM classes, we can control this balance using the C hyperparameter. A smaller C causes a wider decision boundary but more marginal violations, a high C value, on the other hand, makes fewer margin violations but ends up with a smaller decision boundary or margin. At times marginal violations can lead to fewer prediction errors if the marginal violations are on the correct side of the margin.

Implementation in Python

The following scikit-learn code loads the iris dataset, scales the features, and trains a linear SVM model using the LinearSVC class with C=1 and the hinge loss function to detect Iris-Virginica flowers.

import numpy as np
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
iris = datasets.load_iris()
x = iris[“iris_data”][:, (2, 3)] # petal length, petal width
y = (iris[“iris_target”] == 2).astype(np.float64) # Iris-Virginica
svm_clf = Pipeline([
(“scaler”, StandardScaler()),
(“linear_svc”, LinearSVC(C=1, loss=”hinge”)),
])
svm_clf.fit(x, y)
svm_clf.predict([[5.5, 1.7]])
output — array([1.])

SVM classifiers do not output probabilities for each class like logistic regression classifiers. We can also use the SVC class using SVC(kernel=” linear”, C=1), but as it is much slower, with large training sets especially, it is not advised. Another alternative solution is to use the SGDClassifier class, with SGDClassifier(loss=”hinge”, alpha=1/(m*C)). This technique uses regular Stochastic Gradient Descent to train a linear SVM classifier. It does not converge as fast as the LinearSVC class, but it can be useful to handle huge datasets that do not fit in memory or to handle online classification tasks.

The LinearSVC class regularizes the bias, so we must center the training data first by subtracting its mean. This is automatically done if you scale the data using the StandardScaler. In addition, we should make sure that we set the loss hyperparameter to “hinge”, as it is not the default value.

Therefore, for better performance, you should set the dual hyperparameter to False, unless there are more features than the training examples. It is also possible to carry out nonlinear and polynomial SVM classification which I will hopefully cover in another blog hopefully.

Closing Notes

That’s all for today. Thank you so much for reading my articles and supporting me. :)

To know more about me and to get more content like this follow me on my LinkedIn, Twitter, or Facebook page —

Linkedin — https://www.linkedin.com/in/salman-ibne-eunus-09255a144/

Twitter — https://twitter.com/ibne_eunus

Facebook — https://www.facebook.com/salmaneunus27

An Anatomy of the Support Vector Machines.

Introduction

Explanation

Implementation in Python

Closing Notes

Written by Salman Ibne Eunus