Improving the Performance of XGBoost and LightGBM Inference

Get Up To 36x Faster Inference Using Intel oneAPI Data Analytics Library

Published in

Intel Analytics Software

5 min readDec 2, 2020

Gradient boosting on decision trees is one of the most accurate and efficient machine learning algorithms for classification and regression. There are many implementations of gradient boosting, but the most popular are the XGBoost and LightGBM frameworks. This article shows how to improve the prediction speed of XGBoost or LightGBM models up to 36x with Intel oneAPI Data Analytics Library (oneDAL).

Brief Introduction

Many people use XGBoost and LightGBM gradient boosting to solve various real-world problems, conduct research, and compete in Kaggle competitions. Although these frameworks give good performance out-of-the-box, prediction speed can still be improved. Considering prediction is possibly the most important stage of the machine learning workflow, performance improvements can be quite beneficial.

A previous article showed that oneDAL performs gradient boosting inference several times faster than its competitors.

Fast Gradient Boosting Tree Inference

How to Boost Prediction Quality and Performance Using the Intel Data Analytics Acceleration Library

medium.com

This performance benefit is now available in XGBoost and LightGBM.

Model Converters

All gradient boosting implementations perform similar operations, and therefore have similar data storage. In theory, this facilitates the conversion of trained models from one machine learning framework to another. Model converters in oneDAL are designed to help you transfer a trained model from XGBoost or LightGBM to oneDAL with just a single line of code. Model converters from other frameworks will soon be available.

The following examples show how to convert XGBoost and LightGBM models to oneDAL. First, get the latest version of daal4py for Python 3.6 and higher:

conda install -c conda-forge daal4py'>=2020.3'

Convert an XGBoost model to oneDAL:

# Train an XGBoost model
import xgboost as xgb
clf = xgb.XGBClassifier(**params)
xgb_model = clf.fit(X_train, y_train)# Convert the XGBoost model to a oneDAL model
import daal4py as d4p
daal_model = d4p.get_gbt_model_from_xgboost(xgb_model.get_booster())# Make a faster prediction with oneDAL
daal_prediction =
          d4p.gbt_classification_prediction(nClasses = n_classes)
             .compute(X_test, daal_model).prediction

Convert a LightGBM model to oneDAL:

# Train a LightGBM model
import lightgbm as lgb
lgb_model = lgb.train(params, lgb.Dataset(X_train, y_train))# Convert the LightGBM model to a oneDAL model
import daal4py as d4p
daal_model = d4p.get_gbt_model_from_lightgbm(lgb_model)# Make a faster prediction with oneDAL
daal_prediction =
          d4p.gbt_regression_prediction()
             .compute(X_test, daal_model).prediction

Note that there is temporary limitation on the use of missing values (NaN) during training and prediction. Inference quality might be lower if the data has missing values.

The following example shows how to save and load a model from oneDAL:

# Model from XGBoost
daal_model = d4p.get_gbt_model_from_xgboost(xgb_model)
import pickle# Save model to a file
with open('model.pkl','wb') as out:
    pickle.dump(daal_model, out)# Load model from a file
with open('model.pkl','rb') as inp:
    model = pickle.load(inp)# Make predictions
daal_prediction = d4p.gbt_regression_prediction()
                     .compute(X_test, model)

By default, oneDAL only returns labels for predicted elements. If you need the probabilities as well, you must explicitly ask for them:

# List all results that you need by placing '|' between them
predict_algo = d4p.
 gbt_classification_prediction(nClasses = n_classes,
 resultsToEvaluate = "computeClassLabels|computeClassProbabilities")daal_prediction = predict_algo.compute(X_test, model)# Get probabilities:
probabilities = daal_prediction.probabilities# Get labels:
labels = daal_prediction.prediction

Performance Comparison

The performance advantage of oneDAL over XGBoost and LightGBM is demonstrated using the following off-the-shelf datasets:

Mortgage (45 features, ~9M observations)
Airline (691 features, one-hot encoding, ~1M observations)
Higgs (28 features, 1M observations)
MSRank (136 features, 3M observations)

The models were trained in XGBoost and LightGBM, then converted to daal4py. To compare performance of stock XGBoost and LightGBM with daal4py acceleration, the prediction times for both original and converted models were measured. Figure 1 shows that daal4py is up to 36x faster than XGBoost (24x faster on average) and up to 15.5x faster than LightGBM (14.5x faster on average). Note that prediction quality remains the same (as measured by mean squared error for regression and accuracy and logistic loss for classification).

Figure 1. Comparing daal4py inference performance to XGBoost (top) and LightGBM (bottom). Hardware and software details are below.

oneDAL uses the Intel Advanced Vector Extensions 512 (AVX-512) instruction set to maximize gradient boosting performance on Intel Xeon processors. The most commonly-used inference operations, such as comparison and random memory access, can be effectively implemented using the vpgatherd{d,q} and vcmpp{s,d} instructions in AVX-512. Performance also depends on storage efficiency and memory bandwidth. For tree structures, oneDAL uses smart locking of data in memory to achieve temporary cache localization (i.e., the state when a subset of trees and a block of observations are stored in L1 data cache) so that the majority of memory accesses are satisfied immediately at the L1 level with the highest memory bandwidth.

Final Thoughts

Many applications use XGBoost and LightGBM for gradient boosting, so model converters provide an easy way to accelerate inference using oneDAL. They allow XGBoost and LightGBM users to:

Use their existing model training code without changes.
Do inference up to 36x faster with minimal code changes and no loss of quality.

Installing the Intel oneAPI AI Analytics Toolkit

The Intel oneAPI AI Analytics Toolkit (AI Kit) provides a consolidated package of Intel’s latest deep and machine learning optimizations all in one place with seamless interoperability and high performance. The AI Kit includes Intel-optimized versions of deep learning frameworks, Python libraries, and a lightweight parallel data frame to streamline end-to-end data science and AI workflows on Intel architectures. The AI Kit, which includes the Intel Distribution for Python (including optimizations for XGBoost, daal4py, and more), is distributed through many common channels, including Intel’s website, YUM, APT, Anaconda, and more. Select and download the distribution package that you prefer and follow the Get Started Guide for post-installation instructions.

Hardware and Software Configuration

Intel Xeon Platinum 8275CL (2nd generation Intel Xeon Scalable processors): 2 sockets, 24 cores per socket, HT:on, Turbo:on. OS: Ubuntu 18.04.4 LTS (Bionic Beaver), total memory of 192 GB (12 slots/16 GB/2933 MHz). Software: XGBoost 1.2.1, LightGBM 3.0.0, daal4py version 2020 update 3, Python 3.7.9, numpy 1.19.2, pandas 1.1.3, and scikit-learn 0.23.2. Training Parameters: XGBoost and LightGBM.