XGBoost for Machine Learning in a Python Windows Environment

Extreme Gradient Boosting, well known as XGBoost is an optimized distributed gradient boosting system designed to be highly efficient, flexible and portable. XGBoost is a recent implementation of Boosted Trees. It is one of the machine learning algorithms that yields great results for supervised learning problems. XGBoost implements machine learning algorithms under the Gradient Boosting framework. This provides a parallel tree boosting, also known as GBDT, GBM that solve many data science problems in a fast and accurate way. The same code runs on major distributed environment(Hadoop, SGE, MPI) and can solve problems beyond billions of examples. The most recent version integrates naturally with DataFlow frameworks(e.g. Flink and Spark). Original paper on XGBoost.
If you are a Machine Learning enthusiast who work on your own, one of the very first things you may encounter while using XGBoost is it’s installation on the operating systems. Specially for Windows I would say it’s not well optimized. Hence thought of sharing my experience and two successful methods of installing XGBoost in Windows 10.
Method 1:
Requirement : Microsoft Visual C++ Redistributable for Visual Studio 2017
- Download .whl from below link, make sure the PD running operating system and the python version for example if your OS is 64bit and python version 3.6 then you may choose cp36‑cp36m‑win_amd64.whl to download) https://www.lfd.uci.edu/~gohlke/pythonlibs/#xgboost
- Open a command prompt cd to the file location downloaded and pip install
If you are lucky enough, your XGBoost Windows environment is ready now!
If not, then you may use below method,
Method 2: From GitHub using Git Bash
Requirements:

- Git for Windows
- Git Bash shell
- Download Git for Windows and install GitBash, once installed you may access it via start menu and this will be the terminal for the XGBoost installation. And follow below commands to direct and download XGBoost from GitHub.
A1828@A1828-NB-02 MINGW64 ~
cd /c/Users/A1828/
$ git clone — recursive https://github.com/dmlc/xgboost
$ cd xgboost
$ git submodule init
$ git submodule update

2. Installing fill fledged 64/32 bit compiler provided with MinGW-W64. Once downloaded, installing with following configurations.

3. Once installed, add the bin folder path (C:\Program Files\mingw-w64\x86_64–8.1.0-posix-seh-rt_v6-rev0\mingw64\bin) to the system variables.
(Control Panel\System and Security\System -> Advanced system settings -> System Properties -> Environment Variables ->

4. Exit GitBash terminal, launch again and try below,
$ which mingw32-make
Should return something similar to

and run below to alias and to start compiling from the directory where we downloaded XGboost
$ alias make=’mingw32-make’
$ cd /c/Users/A1828/xgboost
5. Compiling each sub module, run below
$ cd dmlc-core
$ make -j4
$ cd ../rabit
$ make lib/librabit_empty.a -j4
$ cd ..
$ cp make/mingw64.mk config.mk
$ make -j4
5. Open the Conda promot, type Anaconda in the start menu search bar and run as administrator one the CLI is open follow below instructions to complete the installation
Get to the folder where you have the XGBoost python package
> cd C:\Users\A1828\xgboost\python-package
(base) C:\Users\A1828\xgboost\python-package>python setup.py install

6. Add the run time libraries to the os environment path varible
open a Python Jupyter notebook and run below,
import os
mingw_path = ‘C:\\Program Files\\mingw-w64\\x86_64–5.3.0-posix-seh-rt_v4-rev0\\mingw64\\bin’
os.environ[‘PATH’] = mingw_path + ‘;’ + os.environ[‘PATH’]

Now you are good to run your own XGBoost in Python
Let’s test in a Windows Python 3 development environment, by running a classification model for the famous iris Data Set

Hope this article helps you to setup your XGBoost environment for Windows, trying my best to spare time to share the experiences. Planning to write my next article to show how to run XGBoost in a PySpark Cluster.
KIT,
Buddhika.
Data Scientist : Axiata Analytics Centre