How to make XGBoost available on the PySpark Jupyter Notebook

Bogdan Cojocar
2 min readFeb 11, 2018

Please check out my new medium article to find out how to integrate PySpark ML and XGBoost.

This is a step by step tutorial on how to install XGBoost (an efficient implementation of gradient boosting) on the Jupyter notebook.

Presequisites for this guide are pyspark and Jupyter installed on your system. Please follow this medium post on how to install and configure them.

Step 1: Build XGBoost

For this step we need to clone the repository from github and build the project:

git clone --recursive https://github.com/dmlc/xgboost

Next we need to go into the newly cloned repository and build the project:

cd xgboost
make -j4

For Mac users you have to do an additional step, before building with make :

cp make/config.mk ./config.mk

Step 2: Install the XGBoost python package

The python package is located in the python-package folder and has to be build with setup.py:

cd python-package
sudo python setup.py install

Please note that you might also need to install distutils module if it’s not in the core python packages (depends on the operating system).

Step 3: Running everything in the notebook

--

--

Bogdan Cojocar

Big data consultant. I write about the wonderful world of data.