What is Support Vector Machine Regression?

Analyttica Datalab
2 min readDec 7, 2018

--

Support Vector Machine (SVM) is a supervised machine learning algorithm used for both classification and regression. It is particularly applicable for small and medium-sized datasets.

You can refer to the SVM Classification document for a basic understanding of SVM algorithm and working of SVM package in R in Analyttica TreasureHunt.

Support Vector Regression works in the same way as classification, with only a few minor differences. In the case of regression, since the output is a real number, the output becomes difficult to predict since it has infinite possibilities. The goal of the SVR (Support Vector Regression) then becomes to minimize the error.

In ATH, a Support Vector Machine Classification Algorithm can be run using the path: Machine Learning -> Regression Models -> SVM Regression Algorithm

Example Data
Let’s work on a data where the target variable is a continuous variable. The New York Air Quality Measurement dataset (look for air quality) is another popular dataset in R which has daily air quality measurements in New York for a period in 1973. The various air quality parameters are:

  • Daily temperature from May to August
  • Solar radiation data
  • Ozone data
  • Wind data

The hypothesis for this dataset is that the temperature of a place depends on ozone, wind and solar radiations.

Input
Target Variable
— The target variable is Temp column in the dataset. Since the same SVM function is used for both classification and regression, it must be ensured that the column with a continuous variable is selected as the regression target variable.

Predictors — The ozone, wind and solar radiations reading form the predictors of the dataset along with the month and the day of the month the recordings were made.

Output
RMSE
The measurement of a good model can be computed from the error it makes. Mean Squared Error (MSE) measures the mean of the sum of errors.

Machine Learning models use RMSE for error measurement.

Working only on the complete cases of the air quality data, the RMSE comes out to be 3.43367.

MAPE
The mean absolute percentage error (MAPE) comes out to be 3.21.

COEFS
The coefficients of regression can be extracted using the coefs property of the model. Following are the 7 coefficients of the 91 generated.

Practice Data-sets
You can work on the following case, https://learn.analyttica.com/simulations/Breast-Cancer-Detection-SVM.

--

--

Analyttica Datalab

Analyttica Datalab (www.analyttica.com) is a contextual Data Science (DS) & Machine Learning (ML) Platform Company.