Your basic XGBoost Classification Code

Udbhav Pangotra
Geek Culture
Published in
2 min readJan 14, 2022

This post serves as an starting point in your XGBoost journey

XGBoost is an optimized open-source software library that implements optimized distributed gradient boosting machine learning algorithms under the Gradient Boosting framework. XGBoost was created by Tianqi Chen and initially maintained by the Distributed (Deep) Machine Learning Community (DMLC) group. It is the most common algorithm used for applied machine learning in competitions and has gained popularity through winning solutions in structured and tabular data.

Installing XGB

pip install xgboost

Importing the packages and libraries

# import statements
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import xgboost as xgb
from sklearn.cross_validation import train_test_split
#load Dataset
data = pd.read_csv('../input/diabetes.csv')
data.head()

Now we split the dataset and create our train and test sets!

#Split the dataset into train and Test
seed = 7
test_size = 0.3
X_trian, X_test, y_train, y_test = train_test_split(X_data, y, test_size=test_size, random_state=seed)

Training the Model

#Train the XGboost Model for Classification
model1 = xgb.XGBClassifier()
model2 = xgb.XGBClassifier(n_estimators=100, max_depth=8, learning_rate=0.1, subsample=0.5)

train_model1 = model1.fit(X_trian, y_train)
train_model2 = model2.fit(X_trian, y_train)

Printing the Classification report

#prediction and Classification Report
from sklearn.metrics import classification_report

pred1 = train_model1.predict(X_test)
pred2 = train_model2.predict(X_test)

print('Model 1 XGboost Report %r' % (classification_report(y_test, pred1)))
print('Model 2 XGboost Report %r' % (classification_report(y_test, pred2)))

Now since we have the basics done, let’s move to HyperParameter tuning

#Let's do a little Gridsearch, Hyperparameter Tunning
model3 = xgb.XGBClassifier(
learning_rate =0.1,
n_estimators=1000,
max_depth=5,
min_child_weight=1,
gamma=0,
subsample=0.8,
colsample_bytree=0.8,
objective= 'binary:logistic',
nthread=4,
scale_pos_weight=1,
seed=27)

Training the new Model

train_model3 = model3.fit(X_trian, y_train)
pred3 = train_model3.predict(X_test)
print("Classification Reportfor model 3: %.2f" % (classification_report(y_test, pred3) * 100))

This code should serve as a good starting point! Do reach out and comment if you get stuck!

Other articles that might be interested in:
- Getting started with Apache Spark — I | by Sam | Geek Culture | Jan, 2022 | Medium
- Getting started with Apache Spark II | by Sam | Geek Culture | Jan, 2022 | Medium
- Getting started with Apache Spark III | by Sam | Geek Culture | Jan, 2022 | Medium
- Streamlit and Palmer Penguins. Binged Atypical last week on Netflix… | by Sam | Geek Culture | Medium
- Getting started with Streamlit. Use Streamlit to explain your EDA and… | by Sam | Geek Culture | Medium

Cheers and do follow for more such content! :)

You can now buy me a coffee too if you liked the content!
samunderscore12 is creating data science content! (buymeacoffee.com)

--

--