Support Vector Machines — What Are They?

6 min readJan 8, 2023

Are they really machines? What are these SVMs people keep talking about? SVMs are quite the popular models out there, and are heavily based on mathematical methods, well ideally if you say math or put math into anything, things might become more complex or automatically more daunting, but over here, things are as simple as they can be!

If you are somewhat familiar with classification methods you should be aware of classification by data separation, these methods either include a yes or no, binary formats such as 1 or 0, or multiple labels such as cat, dog flower, etc. Let’s take an example, as you see below in Fig 1. we have a bunch of data scattered on the graph.

At a glance, if we were to separate this data, we would draw a line or a “boundary” between the two data points. But a boundary drawn by person A can be quite different from person B, person C, and so on. And all of these operations are obviously correct, there is no wrong answer, let’s plot some various possible boundaries for Fig 1.

Hyperplane

from Fig 2. we can see that there are several boundaries to separate the data points, but there is definitely, and always, one perfect line of separation between the data points, and this is called a hyperplane. The catch here is, there can be more than one hyperplane, as there can be infinite boundaries.

If we find a hyperplane that separates the data perfectly, we can say that the data is linearly separable.

But how do you choose the location of the hyperplane? To achieve this we use something called a Maximal Margin Classifier.

Maximal Margin Classifier

The idea here is that, if we look at points from away from the boundary, we can be sure of their class, it's either this or that, yes or no, 1 or 0. But if we move closer to the boundary, there it is difficult to say, in reality, the lines and data points aren’t as displayed in Fig 2. you can have a bunch of data points and you draw an optimal boundary, but, you can have a data point 0 falling just close to the boundary of 1 and 0 or the 0 can call in with the 1s.

In these cases, we look at the closest points, and we can find that there is some space, a space that creates a margin.

We can say from these margins in Fig 3. that, the larger the margin, the more separated the points are. And these green and orange lines, from Fig 3. are called maximal margins.

Now, intuitively, we can find the most optimal hyperplane, that can make out data linearly separable, and this is done by finding the maximal margin, with the greatest separation, which gives us the greatest predictive power.

Now that we have talked about all this, the real question is, what are support vectors?

Support Vectors

In Fig 4. you can observe that some of the points are on the green and orange lines, these points are called support vectors.

To simplify this let's break down the words, support means something that can assist, and vector in machine learning is a tuple of one to more data points/scalars. Now it makes sense! support vectors are a bunch of data points, which lie on the maximal margin.

The hyperplane can be defined by those closest points to the boundary, and we call them support vectors. The maximal margin hyperplane depends only on the support vectors, and these support vectors define the classification model. The model is a set of data points that define the location of boundaries between the two classes.

All of this makes sense, and now we can start to define how SVMs make predictions, if you paid attention so far, you might realize that most of the work is done by the maximal margin, and that is exactly the theory behind support vectors, it is maximal margin classifier

Maximal margin classifier — prediction

Say, for instance, we have two classes, green and orange.

for these two classes we define a hyperplane which is done via the maximal margin, once we have our hyperplane the prediction depends on where the new instance is located compared to the maximal margin hyperplane.

But in reality, life is more complex than the figures shown above, and very often the data is scattered all over the place.

Soft Margin

In this case in Fig 8. the data towards the left also has some orange points, and at a glance, we know that this is not linearly separable.

The key is that we try to find a line separating most of the data points and accept that some of the points are misclassified. This is called a soft margin, where we let some of the data points be on either side of the hyperplane. These obviously are errors and misclassification and how do we handle this?

The key to handling such situations is to use a cost function, where we accept some misclassified instances.

Why? because soft margin classifiers tend to be more robust than maximal margin classifiers. If I was to take a new instance which is orange and put it on the green side of the hyperplane in Fig 7. it would poop itself, and now the maximal margin has to change, which means my hyperplane will change, in other words, it is very sensitive to new data, this leads to a risk of overfitting. On the other hand, soft margin classifiers are generally more robust and use cost functions to predict and are better classifiers.

The main idea is to make the data linearly separable, but what if it looks like there is no way in the world, we can make the data linearly separable, for this, we use kernel functions.

Kernel Functions

As you can see in Fig 9. something that seems like we cannot do anything about it can be separated with kernel functions. And support vector machines use kernel functions to transform the data. But what is it actually doing?

The kernel function enlarges our feature space as if in higher dimensional space, where the data is linearly separable. and you have several kernel functions to choose from.

linear function
polynomial function
radial basis function
sigmoid function

You can do a quick google search to understand what each of these functions does in detail.

And as simple as it sounds, that's a support vector machine for you, a bunch of lines, kernel functions, and margins.

Read my other blogs if you are interested!

Getting started with Neural Networks: For dummies

Here’s the thing about Neural Networks a.k.a NN, it is a stone for a lot of up-and-coming development within ML. It…

medium.com

Tree Algorithms: Decision Trees

One of the most popular models out there is decision trees or just “trees” used for classification and regression. But…

medium.com

Clustering Analysis: Hierarchical Clustering

So you have data, you have gone ahead and performed your EDA, data cleansing, etc, but you realize your data does not…

medium.com

Linear Regression, an in-depth view.

The connection between data science and statistics is stronger when it comes to prediction. What predictions you may…

medium.com

Support Vector Machines — What Are They?

Hyperplane

Maximal Margin Classifier

Support Vectors

Maximal margin classifier — prediction

Soft Margin

Kernel Functions

Getting started with Neural Networks: For dummies

Here’s the thing about Neural Networks a.k.a NN, it is a stone for a lot of up-and-coming development within ML. It…

Tree Algorithms: Decision Trees

One of the most popular models out there is decision trees or just “trees” used for classification and regression. But…

Clustering Analysis: Hierarchical Clustering

So you have data, you have gone ahead and performed your EDA, data cleansing, etc, but you realize your data does not…

Linear Regression, an in-depth view.

The connection between data science and statistics is stronger when it comes to prediction. What predictions you may…

Written by Bhanu Kiran