Photo by PAUL SMITH on Unsplash

From Hours to Minutes: 600x Faster SVM

Patching scikit-learn for Better Machine Learning Performance

Dmitry Kalyanov
4 min readDec 30, 2020

--

Hello, readers. If you are here, you are probably interested in the support vector machines (SVM) algorithm. Maybe you’re just looking for more information about it. Or, maybe you’ve tried training SVM models, only to find that the computation takes too long. Whatever the reason, we are going to show you how to run SVM faster than ever before with Intel® Extension for Scikit-learn.

What Is SVM?

SVM is an umbrella term for a group of supervised machine learning algorithms based on the idea of maximizing the margin between two classes. You can use SVM for both classification and regression problems, tune its hyperparameters, and choose a kernel function suitable for your data. A detailed description of SVM is beyond the scope of this short article, but there is plenty of theoretical and practical information online.

SVM has many advantages, but many complain about its speed, which is not surprising because the training time scales with O(num_samples² x num_features) so building a model can take a long time.

How to Improve SVM Performance

Improving performance is as simple as adding -m sklearnex to the Python command that launches your scikit-learn code:

python -m sklearnex my_scikit_learn_program.py

Alternatively, you can patch scikit-learn inside your code before importing it:

from sklearnex import patch_sklearn 
patch_sklearn()
from sklearn.svm import SVC # your usual code without any changes

For more details about the installation process and available algorithms, see Intel® Extension for Scikit-learn documentation.

The effect is the same in both cases: your code will run much faster. To show how much, we created some benchmarks, which will become a part of our official benchmarks very soon, and investigated a wide range of datasets. For each dataset, we selected the most suitable parameters (see configuration details below).

Two of these datasets, klaverjas and covertype, are quite large for the stock SVM implementation in scikit-learn, which results in long execution times (Figures 1 and 2). To demonstrate the advantage of Intel® Extension for Scikit-learn patching, we compare to two popular frameworks that provide SVM implementations: scikit-learn and ThunderSVM. We measured the execution time of the training (Figure 1) and prediction (Figure 2) stages. Each chart shows the speedup of ThunderSVM and patched scikit-learn compared to the stock scikit-learn, as well as the execution time of stock scikit-learn.

Figure 1. Improving SVM training performance
Figure 2. Improving SVM inference performance

Let’s see how much time can be saved by using Intel® Extension for Scikit-learn:

  • Training. For large datasets, patched scikit-learn is up to 143x faster than the stock SVM implementation in scikit-learn. There is also a significant performance improvement for smaller datasets.
  • Inference. Patched scikit-learn is up to 600x faster than the stock version of scikit-learn. For all test cases, the patched scikit-learn SVM is at least 65 times faster than stock implementation.

These results show that for larger datasets you can train models in minutes instead of hours, which is incredible if you keep in mind that all you have to do is patch scikit-learn.

We also tracked prediction accuracy for all models. Prediction accuracy of patched scikit-learn is on par with competitors, which means that whether you use stock scikit-learn, its patched version, or ThunderSVM to train your model, you will get the same results.

Conclusions

With Intel® Extension for Scikit-learn patching you can:

  • Use your scikit-learn code for training and inference without modification.
  • Train SVM models up to 143 times faster.
  • Do inference up to 600 times faster.
  • Get the same quality of predictions as other tested frameworks.

You get all of this without have to change your code or hardware.

Installing Via Intel oneAPI AI Analytics Toolkit

The Intel oneAPI AI Analytics Toolkit (AI Kit) provides a consolidated package of Intel’s latest deep and machine learning optimizations all in one place with seamless interoperability and high performance. The AI Kit includes Intel-optimized versions of deep learning frameworks, Python libraries, and a lightweight parallel data frame to streamline end-to-end data science and AI workflows on Intel architectures. The AI Kit, which includes the Intel Distribution for Python and all of Intel’s scikit-learn optimizations, is distributed through many common channels, including Intel’s website, YUM, APT, Anaconda, and more. Select and download the distribution package that you prefer and follow the Get Started Guide for post-installation instructions.

Hardware and Software Configuration

Intel Xeon Platinum 8280L (2nd generation Intel Xeon Scalable processors): 2 sockets, 28 cores per socket, HT:on, Turbo:on, total memory of 384 GB (12 slots/16 GB/2933 MHz). Software: scikit-learn 0.23.2, ThunderSVM (CPU) 0.3.3, oneDAL and daal4py 2021.1, Python 3.7.9.

Model Configuration Details

--

--