Photo by Marc-Olivier Jodoin on Unsplash

Intel Gives Scikit-Learn the Performance Boost Data Scientists Need

Faster Machine Learning on Intel Processors

Rachel Oberman
3 min readFeb 24, 2021

--

Scikit-learn is one of the most widely used Python packages for data science and machine learning (ML). Scikit-learn accelerators can analyze ML data across many industry use-cases while driving efficient use of hardware resources. Intel® Extension for Scikit-learn, made available through Intel oneAPI AI Analytics Toolkit (AI Kit), reduce run times and gives data scientists time back to focus on their models. Intel has invested in optimizing the performance of Python and has optimized key data science libraries like scikit-learn, XGBoost, NumPy, and SciPy.

In a recent benchmark, Intel engineers analyzed how Intel-optimized scikit-learn performs on the 2nd Generation Intel Xeon Scalable processors compared to AMD and NVIDIA processors. Using Intel performance as the baseline, shown as the solid blue line at 1.00 in the chart below, we see that the Intel-optimized scikit-learn algorithms outperform the same algorithms run on the AMD EPYC 7742 processor (shown in orange). The Intel Advanced Vector Extensions (AVX-512), unavailable on AMD processors, provide much of the performance improvement. We also see that the Intel-optimized scikit-learn consistently outperformed the NVIDIA V100 GPU (shown in purple).

Installing Intel® Extension for Scikit-learn

The AI Kit, which includes all of Intel’s scikit-learn optimizations, is distributed through many common channels, including Intel’s website, YUM, APT, Anaconda, and more. Select and download the distribution package that you prefer and follow the Get Started Guide for post-installation instructions.

Alternately, you can download Intel® Extension for Scikit-learn using either PyPI or Anaconda Cloud (available from main, conda-forge and intel channels). It supports Linux, Windows, and Mac systems on x86 architectures.

pip install scikit-learn-intelexconda install scikit-learn-intelex -c conda-forge

Using Intel® Extension for Scikit-learn

Once installed, you can accelerate scikit-learn applications. You can load the Intel® Extension for Scikit-learn module from the Python command line:

python -m sklearnex your_application.py

While using the command-line is fine for testing and experimentation, you can also patch scikit-learn inside of your Python program before importing any other scikit-learn modules:

from sklearnex import patch_sklearn
patch_sklearn()
from sklearn.svm import SVC # your usual code without any changes

When it’s successfully patched, the console will show a message like this:

In [1]: from sklearnex import patch_sklearnIn [2]: patch_sklearn()
Intel(R) Extension for Scikit-learn* enabled
(https://github.com/intel/scikit-learn-intelex)

Many data scientists spend hours and even days waiting for algorithms to run and process data. After an initial analysis, they may have to repeat the analysis with different parameters, looking for a more accurate model. Faster processing means more time analyzing the data, tweaking and improving models, and solving the underlying ML problem. The optimizations for scikit-learn are the key to helping data scientists do just that.

Resources

--

--

Rachel Oberman

AI Software Technical Consulting Engineer at Intel Corporation