Your basic XGBoost Classification Code

Published in

Geek Culture

2 min readJan 14, 2022

This post serves as an starting point in your XGBoost journey

XGBoost is an optimized open-source software library that implements optimized distributed gradient boosting machine learning algorithms under the Gradient Boosting framework. XGBoost was created by Tianqi Chen and initially maintained by the Distributed (Deep) Machine Learning Community (DMLC) group. It is the most common algorithm used for applied machine learning in competitions and has gained popularity through winning solutions in structured and tabular data.

Installing XGB

pip install xgboost

Importing the packages and libraries

# import statements
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import xgboost as xgb
from sklearn.cross_validation import train_test_split#load Dataset
data = pd.read_csv('../input/diabetes.csv')
data.head()

Now we split the dataset and create our train and test sets!

#Split the dataset into train and Test
seed = 7
test_size = 0.3
X_trian, X_test, y_train, y_test = train_test_split(X_data, y, test_size=test_size, random_state=seed)

Training the Model

#Train the XGboost Model for Classification
model1 = xgb.XGBClassifier()
model2 = xgb.XGBClassifier(n_estimators=100, max_depth=8, learning_rate=0.1, subsample=0.5)

train_model1 = model1.fit(X_trian, y_train)
train_model2 = model2.fit(X_trian, y_train)

Printing the Classification report

#prediction and Classification Report
from sklearn.metrics import classification_report

pred1 = train_model1.predict(X_test)
pred2 = train_model2.predict(X_test)

print('Model 1 XGboost Report %r' % (classification_report(y_test, pred1)))
print('Model 2 XGboost Report %r' % (classification_report(y_test, pred2)))

Now since we have the basics done, let’s move to HyperParameter tuning

#Let's do a little Gridsearch, Hyperparameter Tunning
model3 = xgb.XGBClassifier(
 learning_rate =0.1,
 n_estimators=1000,
 max_depth=5,
 min_child_weight=1,
 gamma=0,
 subsample=0.8,
 colsample_bytree=0.8,
 objective= 'binary:logistic',
 nthread=4,
 scale_pos_weight=1,
 seed=27)

Training the new Model

train_model3 = model3.fit(X_trian, y_train)
pred3 = train_model3.predict(X_test)
print("Classification Reportfor model 3: %.2f" % (classification_report(y_test, pred3) * 100))

This code should serve as a good starting point! Do reach out and comment if you get stuck!

Cheers and do follow for more such content! :)

You can now buy me a coffee too if you liked the content!
samunderscore12 is creating data science content! (buymeacoffee.com)

Your basic XGBoost Classification Code

Written by Udbhav Pangotra