From Theory to Practice: Implementing Support Vector Regression for Predictions in Python

6 min readApr 21, 2023

Are you looking to gain a deeper understanding of Support Vector Regression (SVR) and how it can be implemented in Python? Look no further. In this article, I demystify the theory behind SVR and explain how it works, without overwhelming you with complex mathematical equations. I’ll then guide you through the process of implementing SVR in Python using the scikit-learn library. By the end of this post, you’ll be equipped with the knowledge and skills to leverage the power of SVR for your own regression modeling projects.

Support Vector Regression is a type of Support Vector Machines

Support Vector Regression (SVR) is a type of Support Vector Machine (SVM) algorithms and is commonly used for regression analysis. SVMs are powerful supervised learning algorithms that are primarily used for classification problems.

SVM aim to find the optimal hyperplane that best separates the two classes in the input data. A hyperplane is a flat subspace of dimension p-1 in a p-dimensional space, where p is the number of input features. The optimal hyperplane is the one that maximizes the margin, which is the distance between the hyperplane and the closest data points from each class, known as support vectors.

Similar to SVMs, SVR uses the concept of a hyperplane and margin but there are differences in their definitions. In SVR, the margin is defined as the error tolerance of the model, which is also called as the ε-insensitive tube. This tube allows some deviation of the data points from the hyperplane without being counted as errors. The hyperplane is the best fit possible to the data that fall within the 𝜖-insensitive tube. The difference of SVM and SVR is summarized in the figure below.

What is the theory behind SVR

In SVR, the goal is finding the best fit that accurately predicts the target variable while reducing complexity to avoid overfitting. To achieve this, an ε-insensitive tube around the hyperplane is defined, which permits some level of deviation between the actual and predicted target values, creating a balance between the complexity of the model and its generalization power.

SVR can be mathematically formulated as a convex optimization problem. The objective of problem is to find a function f(x) that is as flat as possible while having a maximum deviation of ε from the actual targets for all the training data. Flatness of the function implies that it is less sensitive to small changes in the input data, which reduces the risk of overfitting. For linear functions, flatness means having a small value of w in the best fit function f(x) = wx + b.

Sometimes optimization problem is not feasible or we may want to allow for some errors. In that case, we introduce slack variables, which are the data points that fall outside of the ε-insensitive tube. The distance of slack variables from ε-insensitive tube boundary is represented as ξ. To balance the trade-off between the model complexity (i.e., flatness of f(x)) and total deviations beyond ε-insensitive tube, we utilize a regularization parameter 𝐶 > 0. The strength of the regularization is inversely proportional to C.

SVR assigns zero prediction error to the points that lie inside the ε-insensitive tube, whereas it penalizes the slack variables proportionally to their ξ. This feature of SVR enables it to handle overfitting more effectively than ordinary regression models.

SVR can handle non-linear regression with a kernel function

SVR can be used for both linear and non-linear regression problems by using various kernel functions. In cases where the data is non-linearly separable, kernel helps in finding function f(x) in higher dimensional space where a linear regression problem can be solved. Common kernel functions used in SVR include linear, polynomial, radial basis function (RBF), and sigmoid.

How to Implement SVR in Python

The purpose of the following coding exercise is to offer a step-by-step guide for building an SVR model, rather than focusing on achiving high accuracy.

Dataset

I used the student marks dataset from kaggle. The data consists of marks of students, their study time and number of courses opted.

import pandas as pd

df = pd.read_csv('data/Student_Marks.csv')
df.head()

For a univariate regression problem I pick time_study as the feature. We have to predict the students’ marks given the average time studied per day by the student.

from sklearn.model_selection import train_test_split

train, test = train_test_split(df, test_size=0.2, random_state=42)

# train and test datasets are sorted for plotting purpose 
train = train.sort_values('time_study')
test = test.sort_values('time_study')

X_train, X_test = train[['time_study']], test[['time_study']]
y_train, y_test = train['Marks'], test['Marks']

Feature scaling

Since SVR is a distance-based algorithm, scaling is an important preprocessing step that can improve the accuracy and stability of the model.

from sklearn.preprocessing import StandardScaler

### When using StandardScaler(), fit() method expects a 2D array-like input
scaler = StandardScaler().fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

Fitting SVR model

The sklearn.svm module includes various SVR classes.

svm.SVR is a class that implements SVR. The hyperparameters are kernel function, C and ε.
svm.LinSVR is similar to SVR class with parameter kernel=’linear’ but has a better performance for large datasets. The hyperparameters are C and ε.
svm.NuSVR uses a parameter nu that controls the number of support vectors and complexity of model. Similar to SVR class, the hyperparameters are kernel function, C and ε.

In this coding exercise I use SVR class from sklearn.svm to evaluate the performance of both linear and non-linear kernel functions.

from sklearn.svm import SVR

svr_lin = SVR(kernel = 'linear')
svr_rbf = SVR(kernel = 'rbf')
svr_poly = SVR(kernel = 'poly')

svr_lin.fit(X_train_scaled, y_train)
svr_rbf.fit(X_train_scaled, y_train)
svr_poly.fit(X_train_scaled, y_train)

Evaluating model performance

Visualization is a valuable method for evaluating the performance of a model. Therefore, I plot the best fit line over the actual data points for training dataset to provide a clear representation of the model’s performance.

from matplotlib import pyplot as plt

#### Model prediction for train dataset ####
train['linear_svr_pred'] = svr_lin.predict(X_train_scaled)
train['rbf_svr_pred'] = svr_rbf.predict(X_train_scaled)
train['poly_svr_pred'] = svr_poly.predict(X_train_scaled)

#### Visualization ####
plt.scatter(train['time_study'], train['Marks'])
plt.plot(train['time_study'], train['linear_svr_pred'], color = 'orange', label = 'linear SVR')
plt.plot(train['time_study'], train['rbf_svr_pred'], color = 'green', label = 'rbf SVR')
plt.plot(train['time_study'], train['poly_svr_pred'], color = 'blue', label = 'poly SVR')
plt.legend()
plt.xlabel('Study time')
plt.ylabel('Marks')

Fitted linear and non-linear SVR on train dataset

As mentioned before, SVR has tunable hyperparameters; ε , C, and Kernel. By tuning these hyperparameters we can achieve a better fit of the model to the training dataset.

Below, I evaluated the performance of linear SVR model on the test dataset using two metrics, R-squared (R2) score and root mean squared error (RMSE) along with a plot of actual vs. predicted target values.

import numpy as np
from sklearn import metrics

#### Test dataset - metrics ####
y_test_pred = svr_lin.predict(X_test_scaled)
r2_score = round(metrics.r2_score(y_test, y_test_pred),2)
rmse = round(np.sqrt(metrics.mean_squared_error(y_test, y_test_pred)),2)
print(f'r2: {r2_score}')
print(f'rmse: {rmse}')

r2: 0.83
rmse: 6.7

R2 value of 0.83 indicates that approximately 83% of the variance in the student marks can be explained by the average study time. RMSE value of 6.7 represents the average magnitude of the residuals (prediction errors) between the actual and predicted student marks.

#### Test dataset - plot ####
y_test_pred = svr_lin.predict(X_test_scaled)
min_axis = min(min(y_test_pred), min(y_test))
max_x = max(max(y_test_pred), max(y_test))
plt.scatter(y_test_pred, y_test)
plt.plot([min_x,max_x], [min_x,max_x], 'r--', label = '1:1')
plt.legend()
plt.xlabel('Prediction')
plt.ylabel('Actual')

Actual vs predicted target for linear SVR model tested on test dataset

Final words

SVR is a powerful algorithm that extends the use of SVMs to regression problems. By defining an ε-insensitive tube around the best fit to the data, SVR permits some level of deviation between the actual and predicted target values, creating a balance between the complexity of the model and its generalization power. SVR can handle both linear and non-linear regression problems.

Thank you for reading this post. I hope it helps you to better understand SVR theory and its application in regression problems. Your feedback is greatly appreciated. You can reach me on LinkedIn.