Allow for the Multi-threads of XGBoost within Conda environments under MacOS
In this post I will illustrate the somewhat subtle issue about allowing for the multi-threads of Xgboost within Conda environments under MacOS. Since the birth of evolutionary paper written by Chen, Tianqi, and Carlos Guestrin (2016), XGBoost has been drawing attention and is increasingly popular in academic researched, industrial applications, and main competitions, like Kaggle. It is highly recommended to read through its home page about the basic ideas, and one can find its advantages over other ensemble learning algorithms. Generally speaking, different from the traditional popular random forests generating the averaged scores from the independent trees growing from random subsets of samples and random subset of features respectively, XGBoost is just trying refining the prediction outcomes by growing sequential trees. Within the training process, one can try adding regularizations on the number of iterations (trees) by moderating early stopping criterion and the learning rate and the complexity of each tree (depth, number of leaves, weights, and etc.) to control for overfitting. Generally speaking, model training process usually requires from dozens of to hundreds of times of repeated trials, basically for the selection of best set of hyperparameters and the selection of the best set of features. Therefore, the speed issues are always important. Luckily, XGBoost is the optimized high-speed algorithm especially utilizing multi-threads automatically (of coz, GPU support is also available). However, I just found the subtle issue that XGBoost R package installed under MacOS seems NOT to employ multi-threads and only single-thread. I also raise the discussion here: https://github.com/dmlc/xgboost/issues/7017
Official Solution for System-wide R
Actually, the developers realized this issue at the early time and then generate the sort of simple solution to the issue. The tricky point is just to guarantee libomp
is installed in MacOS in advance. Please check https://xgboost.readthedocs.io/en/latest/build.html#installing-the-development-version-linux-mac-osx. Therefore, just use brew
brew install libomp cmake
and then follow the instructions to install XGBoost R package from source. For example, just navigate to an arbitrary temporary directory and then input the following in the terminal
git clone --recursive https://github.com/dmlc/xgboost
cd xgboost
git submodule init
git submodule update
mkdir build
cd build
cmake .. -DR_LIB=ON
make
make install
Now one can check the following experiments in R and see now XGBoost can support multi-threads:
# test number of threadsrequire(xgboost)
x <- matrix(rnorm(100 * 10000), 10000, 100)
y <- x %*% rnorm(100) + rnorm(1000)system.time({
bst <- xgboost(data = x, label = y, nthread = 1, nround = 100, verbose = F)
})# user system elapsed
# 19.257 0.111 17.062system.time({
bst <- xgboost(data = x, label = y, nthread = 4, nround = 100, verbose = F)
})# user system elapsed
# 17.632 0.056 4.450
Solution within Conda environments
The above solution is only about installing XGBoost R package for system-wide R under MacOS (big sur), I mainly talk about installing XGBoost R package within some conda environment. Generally speaking, python is often used within the conda environments in most of the cases. On the hand, R can also be installed and set up within conda environments. Actually, several benefits are raised for the use of R within the conda environments.
- Isolation: Within the conda environment problems can be always tested without the influence on the system, since the whole conda environment can be safely removed. On the other hand, the R/Python packages are also stored within the conda environment, without the potential dependencies on the packages outside the environment.
- MKL Acceleration: R employs its default BLAS for matrix computation whose speed is not satisfactory: https://csantill.github.io/RPerformanceWBLAS/ and Intel MKL library is the optimized BLAS/LAPACK for matrix computation. However, it is not that straightforward to link MKL when using system-wide R installed from R website. Currently, the conda environment can setup MKL for R given the settings in dependencies the following
yml
file for example:
name: R_4.0_mkl
channels:
- conda-forge
- defaults
dependencies:
- python=3.8
- conda-forge::r-base=4.1.0
- conda-forge::libblas=3.9.0=9_mkl
- Reproducibility: This is right and wrong. If the R packages are totally installed using conda, then all the R packages can be exported as
yml
file for the use of coworkers. However, installation using traditionalinstall.packages()
is usually preferred, especially for compilation purpose. The packages installed using such traditional way cannot be included and shown in the yml however.
The question is whether XGBoost R package installed within conda environments can allow for multi-threads. One can also refer to my reports in https://github.com/dmlc/xgboost/issues/7017. In general, two methods are available for the users to install XGBoost R package within conda environment. One is to go into R and then use install.packages("xgboost")
and the other is to use conda install -c conda-forge r-xgboost
in terminal after activating the environment. However, both methods can only install XGBoost R package with single-thread available, even though libomp
is installed already. Googling can help little about this issue, so I just go to check the compilation make files. When activating such R environment, just input the following in R console:
file.path(R.home("etc"), "Makeconf")
One can find the path about the file of make configuration within the conda environment. Just open such file using your favorate editor, and find
SHLIB_OPENMP_CFLAGS = -fopenmp
SHLIB_OPENMP_CXXFLAGS = -fopenmp
SHLIB_OPENMP_FFLAGS = -fopenmp
which are expected, but the followings are empty
SHLIB_CFLAGS =
SHLIB_CXXFLAGS =
SHLIB_FFLAGS =
From my trials and experiments, SHLIB_OPENMP_* are NOT called as effective flags for compilation for XGBoost R package. It is also NOT certain whether other packages requesting compilation call them properly. Since libomp
is installed in system-wide and llvm-openmp
is also installed automatically within the conda environment with R in it, always adding -fopenmp
flag should not be harmful. Therefore, just revise the file by adding the flag to such three empty lines:
SHLIB_CFLAGS = -fopenmp
SHLIB_CXXFLAGS = -fopenmp
SHLIB_FFLAGS = -fopenmp
Now try install.packages("xgboost")
in R within the conda environment. Please note that the following information is still found during the compilation process:
checking whether OpenMP will work in a package... no
*****************************************************************************************
OpenMP is unavailable on this Mac OSX system. Training speed may be suboptimal.
To use all CPU cores for training jobs, you should install OpenMP by runningbrew install libomp
*****************************************************************************************
However, -fopenmp
is also found in the flags during the installation process. After the installation, the following experiments should indicate that OpenMP
is in use:
r$> require(xgboost)
x <- matrix(rnorm(100 * 10000), 10000, 100)
y <- x %*% rnorm(100) + rnorm(1000)system.time({
bst <- xgboost(data = x, label = y, nthread = 1, nround = 100, verbose = F)
})
Loading required package: xgboost
user system elapsed
19.429 0.130 17.317r$> system.time({
bst <- xgboost(data = x, label = y, nthread = 4, nround = 100, verbose = F)
})
user system elapsed
17.949 0.063 4.538r$> system.time({
bst <- xgboost(data = x, label = y, nthread = 8, nround = 100, verbose = F)
})
user system elapsed
27.401 0.094 3.457
Other Issues
What about XGBoost Python Package in Conda environment under MacOS
Interestingly, XGBoost Python package in conda environment under MacOS seems to be compiled correctly and multi-threads are available. Generally speaking, XGBoost Python package can be installed via
conda install -c conda-forge xgboost
Here is the test for XGBoost Python package. Within some conda environment for Python, numpy
and xgboost
are installed:
conda install -c conda-forge numpy libblas=3.9.0=9_mkl
conda install -c conda-forge xgboost
Then in ipython
:
In [1]: import numpy as np
...: import xgboost as xgb
...: import timeit
...:
...: data = np.random.rand(10000, 100)
...: label = np.random.randint(2, size=10000)
...: dtrain = xgb.DMatrix(data, label=label)
...:
...: param_1 = {'objective': 'binary:logistic', 'nthread': 1, 'eval_metric': 'auc'}
...:
...: param_4 = {'objective': 'binary:logistic', 'nthread': 4, 'eval_metric': 'auc'}
...:
...: param_8 = {'objective': 'binary:logistic', 'nthread': 8, 'eval_metric': 'auc'}
...:
...: num_round = 100In [2]: start = timeit.default_timer()
...:
...: xgb.train(param_1, dtrain, num_round)
...:
...: stop = timeit.default_timer()
...:
...: print('Time: ', stop - start)
Time: 16.160123399In [3]: start = timeit.default_timer()
...:
...: xgb.train(param_4, dtrain, num_round)
...:
...: stop = timeit.default_timer()
...:
...: print('Time: ', stop - start)
Time: 4.242956155000002In [4]: start = timeit.default_timer()
...:
...: xgb.train(param_8, dtrain, num_round)
...:
...: stop = timeit.default_timer()
...:
...: print('Time: ', stop - start)
Time: 3.200284463999999
What about XGBoost Python/R Package in Linux System
Luckily, XGBoost Python/R package installed within conda environment under Linux system is compiled properly and multi-threads are fine to use, based on my tests on Garuda Linux. Please check the basic information for R conda environment under MacOS
r$> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Big Sur 11.3.1Matrix products: default
BLAS/LAPACK: /Users/mm22204/opt/miniconda3/envs/R_4.0_mkl/lib/libmkl_rt.dyliblocale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8attached base packages:
[1] stats graphics grDevices utils datasets methods baseother attached packages:
[1] xgboost_1.4.1.1loaded via a namespace (and not attached):
[1] compiler_4.1.0 magrittr_2.0.1 Matrix_1.3-4 grid_4.1.0
[5] data.table_1.14.0 jsonlite_1.7.2 lattice_0.20-44
Summary
This post I just share the compilation issue about XGBoost under MacOS for the availability of multi-threads. Actually, it is the subtle problem ONLY for XGBoost R package under MacOS, and it is NOT the problem for XGBoost Python package or under Linux system. In recent years I mainly use MacOS as the balanced choice to employ the merits about unix-like system and gain the access to several softwares for works and lives, such as Microsoft Office. However, indeed compilation problems seem to be special for MacOS sometimes and revisions of the corresponding make files are inevitable. Nowadays, it is usual to apply separate environments to distinct projects to make sure the independence and reproducibility, and conda is the usual choice for data science since it support Python and R inherently. However, we still need to be careful about whether the packages are compiled properly to utilize the resources for computation. Hope this can be the example for your reference.
References
- Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785–794).