Support Vector Machine: Classification

Published in

IT Paragon

4 min readNov 21, 2019

Welcome to the next article. In this part, we will discuss further details of the Support Vector Machine.

What do you think about the two pictures above? Which one is correct? The left or the right one? Well, don’t worry. You will know the answer after reading this article. Let’s start it :)

0. Introduction

Suppose you are given of two label classes on graph as shown below (Figure 2). Can you decide a separating line for the classes below?

You might have come up with something similar to the following image (Figure 3). It clearly separates the two classes. That’s what SVM does. It finds out a line/hyper-plane (in multidimensional space that separates the classes). Shortly, we shall discuss why I wrote multidimensional space.

Figure 3: Separation Line between Two Classes

1. Lets make it a bit complex . . .

So far so good. Now consider how if we had data as shown below (Figure 4)? No line can separate the two classes in this x-y plane. So, what should we do? We can apply a transformation and add one more dimension as we call it the z-axis. Let’s assume a value of a point on z plane, w = x² + y². In this case, we can manipulate it as the distance of a point from z-origin. Then, if we plot in the z-axis, a clear separation can be seen and a linear line can be drawn.

Figure 4: Can You Draw a Separating Line in This Plane ?

Figure 5: Transformation (Plot of z-y Axis)

When we transform back this line to the original plane, the line will make a circular boundary as the following image (Figure 6). This transformation is called a kernel.

Figure 6: Transforming Back to x-y Plane. A Line Transforms to Circle.

2. Lets make it a little more complex . . .

How if data plot overlaps? Or, how if in case some of the red points are inside the blue one?

Figure 7: Can You Draw a Separating Line in This Plane ?

Which line (among 1 or 2) should we draw?

Well, both of them are correct. The first tolerates some outlier points but the second one is trying to achieve zero tolerance with perfect partition. In a real-world application, finding the perfect class for billions of training data sets takes a lot of time. As you will see in coding, regularization parameter and gamma should be defined. We can combine those parameters to achieve a considerable non-linear classification line with higher accuracy in a reasonable amount of time.

3. Tuning parameters : Kernel, Regularization, Gamma, and Margin

Kernel

In machine learning, a “kernel” is usually refer to the kernel trick, a method of using a linear classifier to solve a non-linear classifier problem. It entails transforming linearly inseparable data like (Figure 4) to linearly separable ones (Figure 5). The kernel function is what is applied to each data instance to map the original non-linear observations into a higher-dimensional space in which they become separable.

Regularization

The regularization parameter (C parameter in python’s sklearn library) tells the SVM optimization on how much you want to avoid misclassifying on each training sample.

For small values of C, the optimization will choose a larger-margin hyperplane, even if that hyperplane misclassifies more point. On the other hand, a large value of C will cause the optimizer to look for a smaller-margin separating hyperplane if that hyperplane does a better task of getting all the training points classified correctly.

Figure 9: Small Regularization Value (Left), Large Regularization Value (Right)

As you can see on image above (Figure 9), the left one has some misclassification due to smaller regularization value than the right one.

Gamma

The gamma parameter defines how far the influence of a single training example reaches (low values mean far and a high value means close). In other words, with low gamma, points far away from plausible separation lines are considered in the calculation for the separation line. Whereas high gamma means the points close to a plausible line are considered in the calculation.

Margin

The last but very important characteristic of SVM classifier is margin.

A margin is a distance between the separation line to the closets class of point.

A good margin is one where the distance between the separation line and the class of point is larger for both sides. A good margin allows the points to be in their respective classes without crossing to the other classes. Now you know the answer to the question above. Thank you :)