Support Vector Machine: Regression

Beny Maulana Achsan
IT Paragon
Published in
4 min readDec 10, 2019

--

Holla guys! Welcome to the next article. In this part, we will discuss further details of supervised machine learning.

Figure 1: SVM Regression

Well, what do you think of the two pictures above? Which one is correct? The right or the left one? Don’t worry, you will know the answer after reading this article :)

0. Introduction

Supervised machine learning is a kind of ML where the machine learns from labeled data to create a model. Normally, the data is labeled by humans. Supervised learning is the most common and studied type of learning because it is easier to train a machine with labeled data than with unlabeled data. Depending on what you want to predict, supervised learning can be used to solve two types of problems: regression or classification.

Figure 2: Regression vs Classification

Regression

If you want to predict continuous values, such as trying to predict the gold price, you would use regression. This type of example doesn’t have a specific value constraint because the value could be any number with no limits.

Classification

If you want to predict discrete values, such as classifying something into two categories, you would use classification. A case like, “Will she make this purchase” will have an answer that falls into two specific categories: yes or no. For the detail explanation, you can read this one “Support Vector Machine: Classification”.

1. Regression

Regression is another form of supervised learning. As we have discussed above, the difference between classification and regression is that regression outputs a number rather than a class. Figure 1 shows two types of regression with the same epsilon. Well, both of them are correct. The right one tolerates some outlier points but the left one is trying to achieve zero tolerance with perfect regression. In a real-world application, finding the perfect regression for billions of training data sets takes a lot of time. As you will see in coding, regularization parameter and gamma should be defined. We can combine those parameters to achieve a considerable non-linear regression line with higher accuracy in a reasonable amount of time.

2. Tuning parameters : Regularization, Gamma, and Epsilon

Regularization

The regularization parameter (C parameter in python’s sklearn library) tells the SVM optimization on how much you want to avoid misclassifying on each training sample. In other words:

C parameter balances the trade-off between the model complexity and empirical error. To simplify, when C is large, the SVM tends to be overfitting while when C is small, the SVM tends to be underfitting.

Gamma

The gamma parameter defines how far the influence of a single training example reaches (low values mean far and a high value means close). With low gamma, points far away from plausible separation lines are considered in the calculation for the separation line. Whereas high gamma means the points close to a plausible line are considered in the calculation.

When γ is large, the SVM tends to be overfitting. On the other hand, when γ is small, the SVM tends to be underfitting.

Epsilon

In the same way, as with a classification approach, there is a motivation to seek and optimize the generalization bounds given for regression. They relied on defining the loss function that ignores errors, which are situated within a certain distance of the true value. This type of function is often called epsilon intensive loss function.

ε-intensive loss function affects the smoothness of the SVM’s response.

The figure below shows an example of linear and non-linear regression function with epsilon intensive band. The variables measure the cost of the errors on the training points. These are zero for all blue points that are inside the band.

Figure 3: Linear (left) and Non-linear (right) Regression with Epsilon Intensive Band

In regression, epsilon has a function to give a tolerable error of the regression model. It depends on the problems. If we need a high tolerable error, we can increase the epsilon intensive band and vice versa. Now you know the answer to the question above. Thank you :)

--

--