Data Chef Episode 8: SVMs
For this week’s episode, we are going to feature… drumroll…
Support Vector Machines
While the name might not roll off the tongue, I hope to explain why this appetizing classifier should be on every data scientists menu.
The Support Vector Machine algorithm was developed by two Russian mathematicians Vladimir Vapnik and Alexey Chervonenkis in 1963. Vapnik has continued developing the algorithm throughout his career, and in 1992, he created a nonlinear version of the classifier by utilizing the “kernel trick” while working with Bernhard Boser and Isabelle Guyon. He proposed the soft margin nonlinear version with Corinna Cortes in 1993, which was released in 1995.
In an ode to the Russian founders of SVMs, this machine learning algorithm is like a delicious borscht. This staple of Russian cuisine can be served hot or cold, which should help you remember that SVMs are binary classifiers. However, this method can be used in a one vs. all manner for multi-class purposes.
The algorithm works by mathematically and graphically separating two different types of data points with a hyperplane. A hyperplane is a shape with 1 fewer dimension that the data that is being plotted. For 2D data, the hyperplane would be a line, and for 3D data, the hyperplane would be a 2 dimensional plane. Although it is nearly impossible to visualize, this process still works in higher dimensional space. SVMs can provide very accurate models with a minimal amount of transparency as to how the algorithm is working. Like a good borscht, you might be better off enjoying the taste rather than worrying about how it’s made…
The goal of the SVM is to find the orientation of the boundary that creates the largest amount of separation between the two types. This is the boundary that will best distinguish the classifications. This boundary is referred to as the margin. The edges of the margin fall on the data points closest to the boundary, and those are called the support vectors, the namesake of the algorithm. In theory, the data might fall into two clean categories, but in practice, the data might be interspersed in areas. This scenario calls for a soft margin as opposed to the previous hard margin approach. We will utilize the hinge loss function to determine the optimal margin width while penalizing for misclassified data points.
Sklearn has a comprehensive package called Support Vector Classifiers that allows us to implement SVMs.
Whether to return a one-vs-rest ('ovr') decision function of shape (n_samples, n_classes) as all other classifiers, or…scikit-learn.org
The documentation provides information on different SVM techniques including the Polynomial and Radial Basis Function methods. These SVMs utilize the “kernel trick” to project the data points into a higher dimension to create classification separation.
This is all the time we have for this week, but tune in next week to see how to implement an SVM and choose the appropriate hyperparameters.