What is Least Squares Support Vector Machines (LS-SVM)

5 min readJan 4, 2024

It are least squares versions of support vector machines (SVM), which are a set of related supervised learning methods that analyze data and recognize patterns, and which are used for classification and regression analysis

Advantages of LS-SVM

. Efficiency and Ease of Optimization

Unlike standard SVMs, which solve a quadratic programming problem, LS-SVM uses a linear system of equations, making it computationally faster and easier to optimize. This translates to quicker training times and simpler implementation, especially for large datasets.

. Improved Performance on Noisy Data

LS-SVM incorporates a squared loss function instead of the hinge loss used in standard SVMs. This makes it less sensitive to outliers and noise in the data, leading to potentially better generalization and robustness in real-world applications

. Parameter Tuning Flexibility

LS-SVM provides more flexibility in parameter tuning compared to standard SVMs. The separate control over regularization parameters and kernel parameters allows for finer adjustments and potentially better model performance depending on the specific data and task

. Smooth Decision Boundaries

LS-SVM typically generates smoother decision boundaries than standard SVMs. This can be advantageous in problems where a continuous and interpretable relationship between features and the target variable is desired.

Example

Figure 1. Grid search for the optimal γ and σ of the LS-SVM. (A,B) used five random variables in (−1,1) as inputs and the results of Equation (11) as the target. (C,D) used four random variables in (−1,1) as input and a random variable in the same range as the target

Figure 2. Grid search for the optimal C and σ of the SVM. (A,B) used five random variables in (−1,1) as input and the results of Equation (11) as the target. (C,D) used four random variables in (−1,1) as inputs and a random variable in the same range as the target.

Figure 3. Grid search for the optimal C and ε of the SVM model with σ = 0.6. The results were from the validation of a CO2 dataset. Obviously, one cannot have both zero bias and the best correlation. (A) Variation of correlation coefficient; (B) Variation of bias.

Figure 4. Grid search for optimal parameters of the SVM and LS-SVM models using a CO2 dataset. Both models respond to parameter changes similarly. (A) SVM fitting; (B) SVM validation; © LS-SVM fitting; (D) LS-SVM validation

Support vector machines VS Least Square-SVM

SVM

Maximizes margin: Focuses on finding the hyperplane with the largest margin between the two classes, resulting in high generalization and robustness to outliers.

Hard margin: Introduces slack variables to handle misclassified points but penalizes them, prioritizing clear separability

Quadratic programming: Solved using quadratic programming, which can be computationally expensive for large datasets.

LS-SVMs

Minimizes error Aims to minimize the squared error between predicted and actual values, leading to smoother decision boundaries and potentially better performance with noisy data.

Soft margin: Allows for some misclassification but minimizes the overall error, making it more forgiving than SVMs.

Linear system of equations Solved by solving a linear system of equations, making it computationally faster and more scalable than SVMs.

Objective of LS-SVMs

1. Objective of LS-SVMs:
Least-Squares Support Vector Machines (LS-SVMs) aim to solve regression problems, predicting continuous output values rather than class labels. This is in contrast to traditional SVMs, which focus on classification tasks.
2. Mapping Function for Regression:
LS-SVMs employ a mapping function that transforms input data into a higher-dimensional space. This transformation facilitates the application of a linear regression model in the higher-dimensional feature space.
3. Optimization Problem:
The optimization problem in LS-SVMs involves minimizing the sum of squared errors between predicted and actual output values. Linear equality constraints are introduced, derived from the Karush-Kuhn-Tucker (KKT) conditions, providing necessary conditions for a solution.
4. Dual Formulation and Linear Equations:
LS-SVMs utilize a dual formulation, resulting in a set of linear equations that need to be solved to obtain Lagrange multipliers. These Lagrange multipliers, in turn, define the support vectors and contribute to the decision function.
5. Computational Efficiency:
One key advantage of LS-SVMs lies in their computational efficiency, particularly with large datasets. The linear system of equations is often faster to solve compared to the quadratic programming problem associated with traditional SVMs.
6. Trade-off Handling:
LS-SVMs inherently handle the trade-off between model complexity and generalization well. This capability is crucial for achieving a balance between fitting the training data and making accurate predictions on new, unseen data.
7. Strengths and Limitations:
LS-SVMs, like any machine learning model, exhibit both strengths and limitations. The choice between traditional SVMs and LS-SVMs depends on the specific characteristics of the regression problem and the dataset in question.