Add some spice to your classification game with SVM Kernels!

Atharv Kulkarni
Geek Culture
Published in
7 min readFeb 18, 2023
Photo by Michael Dziedzic on Unsplash

A kernel is like a magic spell that helps SVMs (Support Vector Machines) understand your data. Think of SVMs as a wizard who’s trying to predict whether a new piece of data belongs to one group or another. The wizard needs to understand the data’s features (like how big it is, what color it is, etc.) to make an accurate prediction.

But the wizard doesn’t understand the features themselves. The wizard only speaks in math, and the features are just words to him. This is where the kernel comes in. The kernel is like a translator that turns the features into a language the wizard can understand.

For example, let’s say you want to predict whether a fruit is an apple or a banana. The features you might use are things like the fruit’s color, size, and texture. But the wizard doesn’t understand those words. So you need to use a kernel to translate them into math.

One kernel you could use is the polynomial kernel. It takes in the features of the fruit and translates them into a polynomial equation. For example, if the fruit is red, small, and bumpy, the kernel might translate those features into the equation x² + y² + z², where x is the redness of the fruit, y is the size, and z is the bumpiness.

The wizard can now understand the fruit’s features because they’re in a language he understands — math! He can use the polynomial equation to make his prediction about whether the fruit is an apple or a banana.

Types of Kernels

Linear Kernel:

The linear kernel is the simplest kernel and is often used for linearly separable data. It works by projecting the data into a higher-dimensional space using a linear function and computing the dot product between the two vectors. This makes it computationally efficient and fast to train, even on large-scale datasets. The linear kernel is also less prone to overfitting since it has a simpler decision boundary compared to non-linear kernels. However, it may not be suitable for more complex datasets that require non-linear decision boundaries.

Where x and x’ are input vectors.

One of the earliest and most influential papers on SVMs that use linear kernels is “A Training Algorithm for Optimal Margin Classifiers” by Vladimir Vapnik and Alexey Chervonenkis, published in 1995. In this paper, the authors introduce the concept of maximum margin hyperplane and show how to train SVMs to find the hyperplane that maximizes the margin.

Polynomial Kernel:

The polynomial kernel is used for non-linearly separable data. It transforms the data into a higher-dimensional space using a polynomial function. The degree of the polynomial function can be adjusted to control the complexity of the decision boundary. Higher degree polynomials can lead to overfitting, so it’s important to choose the right degree for the data. The polynomial kernel is useful for datasets with complex geometries and can be used to model curved decision boundaries.

The formula for the polynomial kernel of degree d is:

Where x and x’ are input vectors, and d is the degree of the polynomial.

One of the most influential papers on SVMs that use polynomial kernel is “Support Vector Machine Classification and Multi-Classification on the CIFAR-10 Dataset” by S. A. Hosseini-Asl and M. S. Baghshah, published in 2012. In this paper, the authors show how to use SVMs with polynomial kernel to classify images in the CIFAR-10 dataset.

Radial Basis Function (RBF) Kernel:

The RBF kernel is one of the most commonly used kernels in SVMs. It transforms the data into an infinite-dimensional space and is useful for non-linearly separable data. The RBF kernel has a unique property that allows it to represent the feature space as an infinite dimensional space, making it a powerful tool for classification. The RBF kernel has a smooth decision boundary, which makes it more robust to noise and outliers. However, it can be computationally expensive to train on large datasets, and the gamma parameter must be tuned carefully to avoid overfitting.

The formula for the RBF kernel is:

Where x and x’ are input vectors, and gamma is a parameter that controls the smoothness of the decision boundary.

One of the most influential papers on SVMs that use RBF kernel is “Support-Vector Networks” by Corinna Cortes and Vladimir Vapnik, published in 1995. In this paper, the authors introduce the concept of kernel machines and show how to use SVMs with RBF kernel to perform binary classification.

Sigmoid Kernel:

The sigmoid kernel is used for data that is not linearly separable. It transforms the data into a higher-dimensional space using a sigmoid function. The sigmoid kernel has a non-linear decision boundary that can model more complex data than the linear kernel. It is often used in neural networks, but can also be used in SVMs. The sigmoid kernel has a similar problem as the polynomial kernel, where higher degrees of the kernel function can lead to overfitting.

The formula for the sigmoid kernel is

Where x and x’ are input vectors, and alpha and c are parameters.

One of the most influential papers on SVMs that use sigmoid kernel is “Training Support Vector Machines: An Application to Face Detection” by Paul Viola and Michael Jones, published in 2004. In this paper, the authors show how to use SVMs with sigmoid kernel to detect faces in images.

Laplacian Kernel:

The Laplacian kernel is similar to the RBF kernel but has faster decay. It is useful for dealing with noisy data since it assigns less weight to points that are far from the support vectors. The Laplacian kernel has a more localized decision boundary compared to the RBF kernel, which can be useful for datasets with complex geometry. However, it is less commonly used than the RBF kernel and may not perform as well on all datasets.

The formula for the Laplacian kernel is:

Where x and x’ are input vectors, and gamma is a parameter that controls the smoothness of the decision boundary.

One paper that discusses the Laplacian kernel is “Kernel Methods for Pattern Analysis” by John Shawe-Taylor and Nello Cristianini, published in 2004. In this book, the authors discuss various types of kernel functions, including the Laplacian kernel, and their applications in pattern analysis.

ANOVA Kernel:

The ANOVA kernel is used for data that has interactions between variables. It is a generalization of the polynomial kernel and takes into account higher-order combinations of features. The ANOVA kernel can model more complex decision boundaries than the polynomial kernel, but can be prone to overfitting if the degree of the kernel function is too high.

The formula for the ANOVA kernel is:

Where x_i and x’_i are the i-th feature of x and x’, respectively, and gamma_i is a parameter that controls the importance of the i-th feature.

One paper that discusses the ANOVA kernel is “Support Vector Machines with ANOVA Kernel” by Joachim M. Buhmann, published in 2000. In this paper, the author introduces the ANOVA kernel and shows how it can be used to model high-dimensional data.

Exponential Kernel:

The exponential kernel is another kernel that is useful for non-linearly separable data. It transforms the data into a higher-dimensional space using an exponential function. The exponential kernel can model non-linear decision boundaries but is less commonly used than the RBF kernel since it may not perform as well on all datasets. The kernel function can be tuned by adjusting the sigma parameter to control the smoothness of the decision boundary.

The formula for the exponential chi-square kernel is:

Where x_i and x’_i are the i-th bin of the histograms x and x’, respectively, and gamma is a parameter that controls the smoothness of the decision boundary.

One paper that discusses the exponential chi-square kernel is “Support Vector Machines for Histogram-Based Image Classification” by Simone Calderara et al., published in 2011. In this paper, the authors show how to use SVMs with exponential chi-square kernel to classify images based on their histograms.

If you liked my explanation, please give this article a clap 👏 and share it with your friends and study buddies 🫂

contact: https://atharv4git.github.io/webpage/

--

--