Support Vector Machines (SVMs) Explained

Karan Kashyap
Zero Equals False
Published in
6 min readJun 21, 2020
Source: SydneyF, alteryx.com

Support Vector Machines, commonly referred to as SVMs, are a type of machine learning algorithm that find their use in supervised learning problems. These algorithms are a useful tool in the arsenal of all beginners in the field of machine learning since they are relatively easy to understand and implement.

Support Vector Machines are used for classification more than they are for regression, so in this article, we will discuss the process of carrying out classification using SVMs.

How do SVMs Work?

The basic principle behind SVMs is really simple. The training data is plotted on a graph. The number of dimensions of the graph usually corresponds to the number of features available for the data. An example to illustrate this is a dataset of information about 100 humans. If we knew about the height and weight of each human, then these 2 features would be plotted on a 2-dimensional graph, much like the cartesian system of coordinates that we are all familiar with.

Fig. 1: A 2-dimensional graph (the 2 features are height and weight)

Now, if our dataset also happened to include the age of each human, we would have a 3-dimensional graph with the ages plotted on the third axis.

Fig. 2: A 3-dimensional graph (the 3 features are height, weight and age)

The aim of the algorithm is simple: find the right hyperplane for the data plot. The hyperplane is the plane (or line) that segregates the data points into their respective classes as accurately as possible.

Each of the points that lie closest to the hyperplane have their own support vectors. A support vector is a set of values that represents the coordinates of that point on the graph (these values are stored in the form of a vector).

Fig. 3: The hyperplane for a simple data distribution

A visualization of a hyperplane can be seen in the image alongside (Fig. 3). The points shown have been plotted on a 2-dimensional graph (2 features) and the two different classes are red and blue. In a situation like this, it is relatively easy to find a line (hyperplane) that separates the two different classes accurately. Here, the green line serves as the hyperplane for this data distribution.

Now, if a new point that needs to be classified lies to the right of the hyperplane, it will be classified as ‘blue’ and if it lies to the left of the hyperplane, it will be classified as ‘red’.

Hopefully, this has cleared up the basics of how an SVM performs classification. The next thing we must understand is — How do we select the right hyperplane?

How Do We Select the Right Hyperplane?

If we take a look at the graph above (Fig. 3), a close analysis will reveal that there are virtually an infinite number of lines that can separate the data points of the two different classes accurately. Still, it is important to find the hyperplane that separates the two classes the best. An intuitive way to understand this is that we want to choose that hyperplane for which the distance between the hyperplane and the nearest point to it is maximum. The distance between the hyperplane and the closest data point is called the margin. Thus, the task of a Support Vector Machine performing classification can be defined as “Finding the hyperplane that segregates the different classes as accurately as possible while maximizing the margin.”

Fig. 4: The margins for several different hyperplanes that segregate the classes accurately

If we use the same data points from the previous example, we can take a look at a few different lines that segregate the data points accurately. The margins for each of these hyperplanes have also been depicted in the diagram alongside (Fig. 4).

We can clearly see that the margin for the green line is the greatest which is why the hyperplane that we should use for this distribution of points is the green line.

What happens when the Points cannot be separated by a straight line?

Very often, no linear relation (no straight line) can be used to accurately segregate data points into their respective classes. In such scenarios, SVMs make use of a technique called kernelling which involves the conversion of the problem to a higher number of dimensions. This is a difficult topic to grasp merely by reading so we will go over an example that should make this clear.

Imagine a set of points with a distribution as shown below:

Fig. 5: A distribution of points that cannot be segregated using a linear relationship

It is fairly obvious that no straight line can be used to separate the red and blue points accurately. A circle could be used to separate them easily but our restriction is that we can only make straight lines. Thus, what helps is to increase the number of dimensions i.e. kernelling. Instead of using just the x and y dimensions on the graph above, we add a new dimension called ‘p’ such that p = x² + y².

Fig. 6: A plot of p vs. x, showing the effect of kernelling

The result after the application of this transformation has been shown in the graph alongside (Fig. 6). We can clearly see that with this new distribution, the two classes can easily be separated by a straight line. What you will also notice is that if this same graph were to be reduced back to its original dimensions (a plot of x vs. y), the green line would appear in the form of a green circle that would exactly separate the points (Fig. 7).

Fig. 7: A visualization of what Fig. 6 would look like if reduced to its original dimensions

If you take a set of points on a circle and apply the transformation listed above (i.e. p=x²+y²), you would see that it translates into a straight line.

Using the same principle, even for more complicated data distributions, dimensionality changes can enable the redistribution of data in a manner that makes classification a very simple task.

Concluding Remarks

Some of the main benefits of SVMs are that they work very well on small datasets and have a very high degree of accuracy. In addition, they have a feature that enables them to ignore outliers, which allows them to retain their accuracy in situations where many other models would be impacted greatly due to the outliers. One drawback of these algorithms is that they can often take very long to train so they would not be my top choice if I was operating on very large datasets.

In conclusion, we can see that SVMs are a very simple model to understand from the perspective of classification. While we have not discussed the math behind how this can be achieved or a code snippet that shows the creation of an SVM, I hope that this article helped you learn the basics of the logic behind how this powerful supervised learning algorithm works.

--

--