Part II — Support Vector Machines: Regression

This post is the second part of a series of posts on Support Vector Machines(SVM) which will give you a general understanding of SVMs and how they work(the first part of this series can be found here). This will provide you with practical examples of how to use SVMs to tackle regression problems. Regression problems involve the task of approximating a mapping function from input variables to a continuous output variable. The approach of using SVMs to solve regression problems is called Support Vector Regression(SVR).

Now, let's try to solve a regression problem using this approach. For this example, we will be using the Boston House price data set which has 506 records, 13 features and a single output(more information on this data set can be found here).

  1. Imports

First, we need to import a few libraries.

import math
import pandas
from sklearn.preprocessing import MinMaxScaler
from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV, cross_validate
from sklearn.utils import shuffle

Let's see what we have imported,

  • math — allow us to perform mathematical functions with ease
  • pandas — allows us to manipulate data structures more easily
  • sklearn — a machine learning library for python

2. Load data

Now let’s load our data set and specify the features and the dependent variable.

dataset = pandas.read_csv('Dataset.csv')
X = dataset.iloc[:, [0, 12]]
y = dataset.iloc[:, 13]

3. Pre-process data

When pre-processing our data, we are using MinMax scaling, in order to normalize the data set.

scaler = MinMaxScaler(feature_range=(0, 1))
X = scaler.fit_transform(X)

Before feeding data to a model, data is shuffled. For more information regarding these techniques refer to this post.

4. Implement model

We will be showing you how to use all 3 kernels of the SVR model. More information on kernels is included in the first part of this series.

Linear kernel

def svr_model(X, y):
gsc = GridSearchCV(
estimator=SVR(kernel='linear'),
param_grid={
'C': [0.1, 1, 100, 1000],
'epsilon': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10],
},
cv=5, scoring='neg_mean_squared_error', verbose=0, n_jobs=-1)

grid_result = gsc.fit(X, y)
best_params = grid_result.best_params_
best_svr = SVR(kernel='linear', C=best_params["C"], epsilon=best_params["epsilon"], coef0=0.1, shrinking=True,
tol=0.001, cache_size=200, verbose=False, max_iter=-1)

scoring = {
'abs_error': 'neg_mean_absolute_error',
'squared_error': 'neg_mean_squared_error'}

scores = cross_validate(best_svr, X, y, cv=10, scoring=scoring, return_train_score=True)
return "MAE :", abs(scores['test_abs_error'].mean()), "| RMSE :", math.sqrt(abs(scores['test_squared_error'].mean()))

# Run 
print(svr_model(X,y))

Here’s what we get as the error metric results for our SVR model which uses the linear kernel.

Polynomial kernel

def svr_model(X, y):
gsc = GridSearchCV(
estimator=SVR(kernel='poly'),
param_grid={
'C': [0.1, 1, 100, 1000],
'epsilon': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10],
            'degree': [2, 3, 4],            
'coef0': [0.1, 0.01, 0.001, 0.0001]},
cv=5, scoring='neg_mean_squared_error', verbose=0, n_jobs=-1)

grid_result = gsc.fit(X, y)
best_params = grid_result.best_params_
best_svr = SVR(kernel='poly', C=best_params["C"], epsilon=best_params["epsilon"], coef0=best_params["coef0"],                   degree=best_params["degree"], shrinking=True,
tol=0.001, cache_size=200, verbose=False, max_iter=-1)

scoring = {
'abs_error': 'neg_mean_absolute_error',
'squared_error': 'neg_mean_squared_error'}

scores = cross_validate(best_svr, X, y, cv=10, scoring=scoring, return_train_score=True)
return "MAE :", abs(scores['test_abs_error'].mean()), "| RMSE :", math.sqrt(abs(scores['test_squared_error'].mean()))

# Run 
print(svr_model(X,y))

Here’s what we get as the error metric results for our SVR model which uses the polynomial kernel.

RBF kernel

def svr_model(X, y):
gsc = GridSearchCV(
estimator=SVR(kernel='rbf'),
param_grid={
'C': [0.1, 1, 100, 1000],
'epsilon': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10],
'gamma': [0.0001, 0.001, 0.005, 0.1, 1, 3, 5]
},
cv=5, scoring='neg_mean_squared_error', verbose=0, n_jobs=-1)

grid_result = gsc.fit(X, y)
best_params = grid_result.best_params_
best_svr = SVR(kernel='rbf', C=best_params["C"], epsilon=best_params["epsilon"], gamma=best_params["gamma"],
coef0=0.1, shrinking=True,
tol=0.001, cache_size=200, verbose=False, max_iter=-1)

scoring = {
'abs_error': 'neg_mean_absolute_error',
'squared_error': 'neg_mean_squared_error'}

scores = cross_validate(best_svr, X, y, cv=10, scoring=scoring, return_train_score=True)
return "MAE :", abs(scores['test_abs_error'].mean()), "| RMSE :", math.sqrt(abs(scores['test_squared_error'].mean()))

# Run 
print(svr_model(X,y))

Here’s what we get as the error metric results for our SVR model which uses the RBF kernel.

The following table contains a summary of all the error metric results obtained for the three kernels using the SVR model.

Table 1 Error metric results for SVR kernels

Based on the results we can say that the SVR-RBF model performs the best with the given dataset whereas the SVR-linear model performs the worst. By looking at the performance of these models we can say that the dataset used follows a non-linear pattern as it performs best with the non-linear kernels.

This brings us to the end of this post. Hope this article gave you a good understanding of how to use SVMs to tackle regression problems. Until next time, Adios….

More articles related to Machine Learning:

References