Support Vector Machines in Machine Learning

Matteo Zabban
AI Odyssey
Published in
4 min readNov 6, 2023

--

What are Support Vector Machines (SVMs)?

Support vector machines are a supervised learning model with associated algorithms that are used to analyze data for classification and regression. An SVM-trained algorithm, given a set of examples belonging to two different classes, is capable of assigning the new element to either of the classes. Since this model works with classes of two, it is very exploited in the binary sector.

How does it work?

To understand the algorithm, we have to distinguish between :

Linear SVM — the data is linearly separable, which means that a single straight line, a plane, or a hyperplane can effectively separate the data into distinct classes.

Non-Linear SVM — the data is not linearly separable, which means no straight line or plane can separate the data into distinct classes.

Let’s now dive deeper into the two different SVMs with relative examples.

Linear SVM

  1. Identify the hyperspace - In a linear SVM, the first step is to initialize a hyperplane which will vary depending on the dimensions of the set. The goal is to identify the optimal hyperplane that separates the data into two classes with the maximum margin, which is the gap between the hyperplane and the closest data points.
  2. Margin calculation - The central idea is to maximize the margin, which represents the confidence in the model’s prediction, so the more the margin, the more data range it covers.
  3. Support vector identification - The SVM then proceeds to identify the support vectors, which are the data points closest to the hyperplane and have the greatest influence on determining the position and orientation of the hyperplane.
  4. Optimization - In this step, the SVM performs an optimization process to find the hyperplane that maximizes the margin and simultaneously classifies the support vectors. Normally, this is solved by using techniques such as the Sequential Minimal Optimization algorithm.
  5. Classification - After the optimal hyperplane is found, it is used to help classify the new data points. When a new point is registered, the SVM associates a point to a determinate class if it is on the same side as the majority of data points belonging to that class.

Non-Linear SVM

In Non-linear SVMs the process is very similar we just need to take into account a very useful tool, the Kernel trick - when the data is not linearly separable, the SVM employs a Kernel function to map the data into a higher dimensional space where it is more likely to be linearly separable.

Code Example

What follows is a simple code example of training a Linear SVM for binary classification using Python and the machine learning library scikit-learn.

# Import necessary libraries
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
import matplotlib.pyplot as plt

# Generate a synthetic dataset
X, y = datasets.make_classification(n_samples=100, n_features=2, n_informative=2, n_redundant=0, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train a Linear SVM model
svm = SVC(kernel='linear', C=1) # 'linear' kernel specifies a Linear SVM
svm.fit(X_train, y_train)

# Make predictions on the test data
y_pred = svm.predict(X_test)

# Calculate the accuracy of the model
accuracy = np.mean(y_pred == y_test)
print(f"Accuracy: {accuracy:.2f}")

# Visualize the decision boundary
plt.figure(figsize=(10, 6))

# Create a mesh grid to plot the decision boundary
xx, yy = np.meshgrid(np.linspace(X[:, 0].min() - 1, X[:, 0].max() + 1, 100),
np.linspace(X[:, 1].min() - 1, X[:, 1].max() + 1, 100))
Z = svm.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot the decision boundary and data points
plt.contourf(xx, yy, Z, levels=[-1, 0, 1], cmap=plt.cm.RdBu, alpha=0.6)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdBu, edgecolors='k')
plt.title('Linear SVM Decision Boundary')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

Application of SVM

Support Vector Machines find applications in many fields due to their versatility and ability to handle complex classification and regression tasks. Some common applications include:

  1. Image Classification — recognizing objects in images, identifying handwritten digits, and distinguishing between different types of objects;
  2. Text Classification — spam detection, sentiment analysis, and categorizing text documents into topics or genres;
  3. Bioinformatics — classifying genes, predicting protein functions, and analyzing DNA sequences;
  4. Environmental Sciences— predicting and classifying ecological phenomena, detecting changes in Earth’s surface, and monitoring environmental factors.

Advantages of SVM

  1. Effective in High-Dimensional Spaces — SVMs perform very well in high-dimensional feature spaces;
  2. Robust Generalization — SVMs are resistant to overfitting, ensuring strong performance on new, unseen data;
  3. Non-Linear Pattern Handling — SVMs can efficiently handle non-linear data patterns by using kernel functions;
  4. Versatile Applications — SVMs can find use in various fields;
  5. Memory Efficiency — SVMs are memory-efficient as they only store support vectors, making them suitable for large datasets.

Disadvantages of SVM

  1. Computationally Intensive — when working on large datasets, training SVMs can be time-consuming;
  2. Difficulty with Noisy Data — SVMs are sensitive to noisy or mislabeled data, and outliers can significantly impact results;
  3. Lack of Probability Estimates — SVMs do not naturally provide probability estimates, so additional calibration steps are required when needed.

Conclusion

In summary, SVMs are valuable supervised learning models that have established their presence in various fields, from healthcare to marketing and manufacturing, where they tackle pivotal challenges.

Their appeal lies in their remarkable combination of simplicity, versatility, and effectiveness, making them one of the pillars of modern machine learning and data science.

--

--