Algorithm introduction — XGboost

Jeffrey
NTUST-AIVC

--

Kaggle Competition Artifacts You Must Know

Prolegomenon

If you are looking for an efficient, accurate, and reliable machine learning algorithm, then XGBoost may be your best choice. XGBoost is a supervised learning algorithm that builds a powerful integrated learning model by iteratively training multiple weak classifiers. It has the characteristics of strong scalability, fast speed, and high accuracy and also supports multiple loss functions and objective functions. It can be applied to various application scenarios such as regression and classification. Due to its outstanding performance, it has achieved excellent results in many machine learning competitions and has become one of the indispensable algorithms in machine learning.

Boosting? Gradient Boosting?

XGBoost, also known as eXtreme Gradient Boosting, is a gradient boosting algorithm, so before introducing XGBOOST we must first understand what Boosting and Gradient Boosting are.

Boosting is an algorithm that combines multiple weak classifiers (such as decision trees) into one strong classifier. It gradually adjusts the weight of each weak classifier to pay more attention to misclassified samples, thereby improving classification accuracy, but it is also prone to overfitting.

Gradient Boosting is a Boosting algorithm optimized using the gradient descent algorithm. In the Gradient Boosting algorithm, each new weak classifier is trained based on the residuals of the previous classifiers (i.e. the difference between the true value and the current model prediction). The prediction accuracy of the model is further improved by optimizing the residuals of the previous weak classifiers using the gradient descent algorithm.

XGBOOST?

eXtreme Gradient Boosting is a gradient boosting algorithm originally developed by Dr. Chen Tianqi in 2016. It is an extended version of the gradient boosting algorithm that achieves superior performance on classification and regression problems. It uses regularization techniques to improve the predictive accuracy of the model while avoiding the problem of overfitting. XGBoost can also automatically handle missing values and categorical features, which makes using XGBoost easier and faster. It is also very scalable and supports multiple program languages, including Python, R, Java, and Scala. In machine learning competitions, XGBoost is widely used and has achieved many excellent results.

Features

Compared with other machine learning algorithms, XGBoost has the following characteristics:

  • Efficiency: XGBoost uses parallelization technology to quickly process large-scale data sets.
  • Scalability: XGBoost can run in various environments, including distributed environments and Hadoop clusters.
  • Accuracy: XGBoost uses a gradient boosting algorithm and regularization technology to get more accurate prediction results.
  • Interpretability: XGBoost can provide feature importance rankings, making it easier for users to understand how the model works.
  • Flexibility: XGBoost can be used for regression problems, classification problems, and ranking problems, as well as multiple scenarios such as multi-objective learning and hybrid learning.

In addition, XGBoost also provides a wealth of parameter configuration options, users can adjust according to their own needs to obtain the best performance.

Algorithm Introduction

XGBoost uses the following algorithms:

  1. Gradient Boosting algorithm: This was introduced earlier.
  2. Classification and Regression Trees algorithm (CART): The CART algorithm is a decision tree algorithm for classification and regression problems. XGBoost uses the CART algorithm as its weak classifier. Each CART is composed of a binary tree, and samples are assigned to different subtrees according to the value of the feature.
  3. Regularization: In order to prevent overfitting, XGBoost uses regularization techniques, including L1 regularization and L2 regularization. These techniques limit the complexity of the model by adding penalty terms to avoid the problem of overfitting.
  4. Adaptive Learning Rate: XGBoost uses adaptive learning rate technology, which can dynamically adjust the learning rate according to the model's performance. When the model performs well, the learning rate is reduced to avoid overfitting; when the model performs poorly, the learning rate is increased to speed up convergence.

In short, XGBoost uses the gradient boosting algorithm and regularization technology and uses the CART algorithm as its weak classifier to gradually build a more powerful integrated learning model. It also uses adaptive learning rate technology to optimize the learning process to improve training efficiency and accuracy.

Program API

Before starting to analyze the detailed operation method inside, let’s take a look at what parameters need to be given to XGBOOST when using it. For details, please refer to the official website.

class xgboost.XGBRegressor

Parameters:

  • n_estimators: Number of gradient boosted trees. Equivalent to the number of boosting rounds. The default value is 100.
  • max_depth: Maximum tree depth for base learners. The default value is 6.
  • booster: Specify which booster to use: gbtree, gblinear, or dart.
  • learning_rate: Boosting learning rate, default 0.3.
  • gamma: Minimum loss reduction required to make a further partition on a leaf node of the tree.

Methods:

  • fit: Put in X and Y for model fitting.
  • predict: Predict with X and return the predicted category.
  • score: Return the coefficient of determination of the prediction.
  • predict_proba: Predict the probability of each X example being of a given class.

mathematical algorithm

Finally, let’s talk about how XGBOOST calculates in mathematics.
Specifically, the objective function of XGBOOST consists of two parts: a loss function and a regularization term, which can be expressed by the following formula:

Among them, L(y, ŷ) is the loss function, y is the real value, ŷ is the predicted value of the model; Ω(w) is the regularization item, w is the model parameter; λ is the regularization coefficient.
In the process of optimizing the objective function, XGBOOST uses the information of the first-order gradient and the second-order gradient to calculate the weight of each sample, and then build a decision tree. Specifically, for the i-th sample in the t-th iteration, its weight w(t, i) can be calculated by the following formula:

Among them, g(i) is the first-order differential gradient of the i-th sample, h(i) is the second-order differential gradient of the i-th sample, and λ is the regularization coefficient.
When constructing the decision tree, XGBOOST adopts some optimization techniques, such as sorting by feature, splitting by feature, etc., to improve the training speed and accuracy of the model. Ultimately, XGBOOST combines multiple decision trees to form a powerful ensemble model for complex classification and regression problems.

Done!

Now if you can go try to do it by yourself, you would learn more about it. Thanks for your watching, If you like my article don’t forget to leave your clap or share it with your friend. Your appreciation is my motivation.
— Jeffrey

--

--