Python Data Science

Support Vector Machines

Data classification with SVM

Jamel Dargan
5 min readAug 18, 2020
Image looking down the dividing line of a two way road that eventually diverges along a sandy median.
Photo by Brian Wangenheim on Unsplash

The support vector machines (SVM) algorithm has applications for regression problems as well as for classification, both linear and non-linear. In addition, both continuous and categorical values may be used in SVM. This time around, we will take a look at SVM for classification.

SVM can be used to predict whether a Beatles song was written by John or Paul, whether or not a news article is fake, or from what region a chili recipe originates. This article will explore:

  • What is SVM?
  • How do SVMs work?
  • How to implement SVM in python.

Support Vector Machines

Diagram showing a plot of red and green data-points separated diagonally by a bold line with margins on either side.
Figure 1. All images by author, unless otherwise indicated.

SVM supervised machine learning (ML) models separate classes by creating a hyperplane in multi-dimensional space. Key to the idea of SVM is finding a hyperplane (shown as a bold-type line, in the figure 1 illustration) that maximizes the margin between classes or between each class and the hyperplane that divides them.

Support vectors

The points which are closest to the hyperplane are the support vectors. These points will make the decision boundary better by calculating the margins. These support vectors are the points most critical for the iterative calculations required to construct the classifier.

Hyperplane

A decision plane which separates the classes. This plane will optimally be equidistant from the support vectors on either side.

Margin

A gap between the two lines supported by the closest distinct-class vectors (data points) is calculated as the perpendicular distance between the lines.

Workings of SVM

The main objective of SVM is to divide the given data in the best possible way into different classes. The distance between the nearest data points of each class is known as the margin. The goal of SVM is to select a hyperplane with the maximum possible margin between support vectors in the given data. It does it in following ways:

  • Creates multiple decision boundaries that divide data into classes in the best way.
Diagram showing a plot of data-points (some red, some green) separated by three different-angled lines, ‘a’, ‘b’, and ‘c’.
Figure 2

Now, there are multiple decision boundaries (figure 2) but which to select? Intuitively, if we select a hyperplane which is close to the data points of one class, then it might not generalize well. So the goal is to choose the hyperplane which is as far as possible from the data points of each category.

Two classes of data separated by 3 similarly angled lines, ‘a’, ‘b’, and ‘c’, with ‘b’ having the greatest margin between.
Figure 3
  • Maximizing the distance between the classes and the hyperplane would result in an optimal separating hyperplane (indicated by line B, in figure 3). The goal of SVMs is to find the optimal hyperplane, because it not only classifies the existing dataset but also helps predict the class of unseen data.

Note: SVM kernels

Image of a red lava lamp, in the dark, with half-a-dozen suspended blobs.
Photo by Martin Lostak on Unsplash

We suggested earlier that SVM may be used for non-linearly separable classes. To make it work on non-linear data we use something called kernels.

A kernel transforms an input data space into the required space in which data can be linearly separated. There are different types of kernels, including linear, polynomial, and radial basis function (rbf). SVM uses a technique called the kernel trick, by which the kernel takes a low-dimensional input space and transforms it into a higher dimensional space.

SVM Implementation Using Python

Let’s take a look at how the sklearn library supports SVM modeling. We start by importing the necessary modules.

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score

The Dataset

We will be working with the breast cancer dataset provided by sklearn.

# loading the dataset
cancer = datasets.load_breast_cancer()

Exploring the dataset

# printing the names of the 13 features
print("Features: ", cancer.feature_names)
# printing the label type of cancer('malignant' 'benign'), i.e., 2 classes
print("\nLabels: ", cancer.target_names)

The following description is output:

Features:  ['mean radius' 'mean texture' 'mean perimeter' 'mean area'
'mean smoothness' 'mean compactness' 'mean concavity'
'mean concave points' 'mean symmetry' 'mean fractal dimension'
'radius error' 'texture error' 'perimeter error' 'area error'
'smoothness error' 'compactness error' 'concavity error'
'concave points error' 'symmetry error' 'fractal dimension error'
'worst radius' 'worst texture' 'worst perimeter' 'worst area'
'worst smoothness' 'worst compactness' 'worst concavity'
'worst concave points' 'worst symmetry' 'worst fractal dimension']
Labels: ['malignant' 'benign']

The is a binary classification dataset. Let’s view the target data.

# printing the first 50 labels (0:malignant, 1:benign)
print(cancer.target[:50]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 1]}]}]

We will split data into training and testing sets.

# splitting the dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.25)

Creating a model

Let’s start building our SVM model. Sklearn makes it possible to build a classifier and train a model, in only two lines of code.

# creating an SVM Classifier
clf = svm.SVC(kernel='linear') # Linear Kernel
# training the model using the training sets
clf.fit(X_train, y_train)

We can predict on test data:

# predicting the response for test dataset
y_pred = clf.predict(X_test)

Performance

Measuring the performance of our model is accomplished by comparing predictions with the data’s original labels.

# reporting model accuracy
print("Accuracy:", round(accuracy_score(y_test, y_pred)*100),'%')
Accuracy: 94.0 %

Our model achieved an accuracy of 94 %.

Conclusion

In this article, we discussed the SVM algorithm for ML classification. We defined the characteristics of SVM, explained and illustrated how they work for linear models, and coded a linear example in python.

SVM classification has a number of practical applications, from identifying cancerous cells, to predicting wildfires for particular geographies, to deep learning object segmentation. We focused on simple SVM, to develop a linear model for binary classification.

For data that is not linearly separable, kernel SVM may be used to map data onto higher dimensional space. With the kernel trick, a variety of non-linear functions may be tested to determine the optimal hyperplane.

Scoring modules available in sklearn may be used to compare models developed with the various functions. Since SVM is a supervised ML algorithm, the ultimate decision for which function is most appropriate will be yours and will depend on your particular use case.

--

--