Accelerating Linear Models for Machine Learning

Linear Regression Has Never Been Faster

Published in

Intel Analytics Software

6 min readMay 5, 2020

If you have ever used Python and scikit-learn to build machine learning (ML) models from large data sets, you may have also wished that you could make these computations go faster. What if I told you that altering a single line of code could accelerate your ML computations? What if I also told you that getting faster results doesn’t require specialized hardware?

In this article, I will teach you how to train ridge regression models using Intel® Extension for Scikit-learn, then compare the performance and accuracy of these models trained with the vanilla scikit-learn library. This article continues our series on accelerated ML algorithms.

Accelerate K-Means Clustering

Intel’s Hardware and Software for Machine Learning

medium.com

Fast Gradient Boosting Tree Inference

How to Boost Prediction Quality and Performance Using the Intel® Data Analytics Acceleration Library

medium.com

A Practical Example of Linear Regression

Linear regression, a special case of ridge regression, has a lot of applications. For my comparisons, I’m going to use the well-known House Sales in King County, USA data set from Kaggle. This data set is used to predict house prices based on one year of King County sales data.

This data set has 21,613 rows and 21 columns. Each row represents a house that was sold in King County between May 2014 and May 2015. The first column contains a unique identifier for the sale, the second column contains the date the house was sold, and the third column contains the sale price, which is also the target variable. Columns 4 to 21 contain various numerical characteristics of the house, such as the number of bedrooms, square footage, the year when the house was built, zip code, etc. I am going to build a ridge regression model that predicts the price of the house based on the data in columns 4 to 21.

In theory, the coefficients of the linear regression model should have the lowest residual sum of squares (RSS). In practice, the model with the lowest RSS is not always the best. Linear regression can produce inaccurate models if input data suffers from multicollinearity. Ridge regression can give more reliable estimates in this case.

Solving a Regression Problem with scikit-learn

Let’s see how to build a model with sklearn.linear_model.Ridge. The program below trains a ridge regression model on 80% of the rows from the House Sales dataset, then uses the other 20% to test the model’s accuracy.

import numpy as np
import pandas as pd
from sklearn import config_context
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split# Load the data from a text file into pandas DataFrames.
# The third column contains the responses. Load it into df_y.
# Columns after the fourth contain features. Load them into df_X.infile = "data/kc_house_data.csv"
df_X = pd.read_csv(infile, usecols=range(3, 20))
df_y = pd.read_csv(infile, usecols=[2])# Split the data into training and testing subsets.
# Use 80% of the rows to train the regression model.
# Use the remaining 20% to test the model.X_train_np, X_test_np, y_train_np, y_test_np =   \
    train_test_split(df_X, df_y, train_size=0.8, \ 
                     random_state=7777777)# Convert the data into pandas DataFrames for convenience:
X_train = pd.DataFrame(X_train_np)
y_train = pd.DataFrame(y_train_np)
X_test = pd.DataFrame(X_test_np)
y_test = pd.DataFrame(y_test_np)# Construct a ridge regression algorithm object:
alg = Ridge(alpha=1.0)# Train a ridge regression model:
model = alg.fit(X_train, y_train)# Check the model's accuracy on a test set:
y_pred = model.predict(X_test)MSE = ((y_test.values - y_pred)**2).sum()
RMSD = np.sqrt(MSE/y_test.size)
R2 = alg.score(X_test, y_test)

The resulting equals 0.69, meaning that our model describes 69% of variance in the data. See the Appendix for details on the quality of the trained models.

Intel® Extension for Scikit-learn

Even though ridge regression is quite fast in terms of training and prediction time, you usually need to perform multiple training experiments to tune the hyperparameters. You might also want to experiment with feature selection (i.e., evaluate the model on various subsets of features) to get better prediction accuracy. Since one round of training can take several minutes on large data sets, which can quickly add up if your task requires multiple rounds of training, the performance of linear model training is critical.

Intel® Extension for Scikit-learn is a drop-in replacement patching for vanilla scikit-learn, but it is optimized for Intel architectures. It has the same set of algorithms and the APIs as Continuum’s scikit-learn, so no code changes are required to get a performance boost for ML algorithms.

How to Configure Intel® Extension for Scikit-learn

There are several ways to install Intel® Extension for Scikit-learn using Conda or PyPI:

conda install scikit-learn-intelex -c conda-forgepip install scikit-learn-intelex

You can enable patching without modifying your application. Just use the following command-line flag:

python -m sklearnex my_app.py

Patching also can be enabled within your application:

from sklearnex import patch_sklearn
patch_sklearn()from sklearn.svm import SVC # your usual code without any changes
…

Applying the patch impacts the following scikit-learn algorithms:

sklearn.linear_model.LinearRegression
sklearn.linear_model.Ridge (solver=’auto’)
sklearn.linear_model.LogisticRegression and sklearn.linear_model.LogisticRegressionCV (solver in [‘lbfgs’, ‘newton-cg’])
sklearn.decomposition.PCA (svd_solver=’full’)
sklearn.cluster.KMeans (algo=’full’)
sklearn.metric.pairwise_distance, with metric=’cosine’ or metric=’correlation’
sklearn.svm.SVC

This list will continue to grow in the next releases. You can the latest list of supported parameters here.

Performance Comparison

To compare performance of the vanilla scikit-learn and Intel-optimized scikit-learn, we used the King County data set plus six artificially generated datasets with varying numbers of samples and features. The latter were generated using the scikit-learn make_regression function:

X, y = make_regression(n_samples=nrows, n_features=ncols, \
                       n_informative=ncols, noise=10.0, bias=10.0, \
                       random_state=7777777)
X_train = pd.DataFrame(X)
y_train = pd.DataFrame(y)

Figure 1 shows the wall-clock time spent on training a ridge regression model with two different configurations:

Scikit-learn version 0.22 installed from the default set of conda channels.
Scikit-learn version 0.21.3 optimized with Intel® Extension for Scikit-learn.

To enable Intel optimizations in scikit-learn, we used the -m sklearnex command-line option. For performance measurements, we used the Amazon Web Services Elastic Compute Cloud (AWS EC2). We chose the instance that gives best performance:

CPU: c5.metal (2nd generation Intel Xeon Scalable processors, 2 sockets, 24 cores per socket).

Amazon states that

C5 instances offer the lowest price per vCPU in the Amazon EC2 family and are ideal for running advanced compute-intensive workloads.

c5.metal instance has the most CPU cores and the latest CPUs among the C5 instances.

See the Configuration section below for hardware details. See the Appendix for details on the quality of trained models.

Figure 1 shows that:

Ridge regression training is up to 5.49x faster with Intel® Extension for Scikit-learn than with vanilla scikit-learn.
The performance improvement from Intel® Extension for Scikit-learn increases with the size of the data set.

What Makes Intel® Extension for Scikit-learn Faster?

For the big data sets, ridge regression spends most of its compute time on matrix multiplication. oneDAL’s implementation of ridge regression (used underneath of the extension) relies on the Intel Math Kernel Library (Intel MKL), which is highly-optimized for Intel CPUs. Intel MKL uses Single Instruction Multiple Data (SIMD) vector instructions from the Intel Advanced Vector Extensions 512 (Intel AVX-512) available on 2nd Generation Intel Xeon Scalable processors. Compute-intensive kernels like matrix multiplication benefit significantly from the data parallelism that these instructions provide. Another level of parallelism for matrix multiplication is achieved by splitting matrices into blocks and processing them in parallel using Threading Building Blocks (TBB).

Intel® Extension for Scikit-learn gives significantly better performance for ridge regression with no loss of model accuracy and little to no code modification. The performance advantages are not limited to just this algorithm. As mentioned previously, the list of optimized ML algorithms continues to grow.

Configuration

Hardware

c5.metal AWS EC2 instance: Intel Xeon 8275CL processor, two sockets with 24 cores per socket, 192 GB RAM. OS: Ubuntu 18.04.3 LTS.

Testing date: 04/06/2020

Software

Vanilla sklearn: Python 3.8.0, scikit-learn 0.22, Pandas 0.25.3.

Optimized scikit-learn: Conda install from intel channel: Python 3.7.4, scikit-learn 0.21.3 optimized with daal4py 2020.0 build py37ha68da19_8, Pandas 0.25.1.

Python and accompanying libraries are the default versions installed by the conda package manager when configuring the respective environments.