Enhancing Your Understanding of Gradient Boosting

Published in

Trusted Data Science @ Haleon

7 min readJul 10, 2023

Machine learning (ML) is revolutionising the way we solve complex problems and make predictions. One popular technique in the field is gradient boosting, which has gained significant attention due to its effectiveness across a wide range of applications. In this blog post, we’ll aim to enhance your understanding of gradient boosting, by explaining its core principles and how it works, supported by insights into how it has been leveraged at Haleon.

What is Machine Learning?
Machine learning is the process whereby computers learn the rules to solve a problem by making decisions from data without rules being explicitly programmed. Let’s take email sorting as example. In a traditional approach, to decide whether an email qualifies as spam, programmers need to write complicated algorithms containing all the rules for email grouping. With a machine learning solution, we’d gather a dataset consisting of thousands of emails and sort them into two labelled groups. The ML algorithm would then go through a process called training, by looking at the two groups of emails to create its own rules. After the model is trained, it can be used to sort emails it has never come across before and predict whether an email is or isn’t spam given its content and sender.

What is Supervised Machine Learning?
Supervised machine learning is the branch of machine learning where the computer learns how to perform a function by looking at labelled training data (as the spam email example above). We train the supervised learning model by showing it data where the values to be predicted are already known. Our machine learning algorithm uses that data to work out the rules to reproduce the same results. For example, if we show the numbers 2 and 2, and tell it the answer is four, then numbers 3 and 5, and answer 8, the model learns addition. This is the model training process and once we have a trained model, we can use it to accurately predict values of previously unseen data. For example, inputting numbers 7 and 2 will result into a prediction of 9 from the trained model.

What is Gradient Boosting?
Gradient boosting is a machine learning technique that combines multiple weak predictive models, typically decision trees, to create a stronger and more accurate predictive model. A decision tree is a model where you have branching decision points and you determine a final value by following a path through the tree. Gradient Boosting is an ensemble method, which means it leverages the collective wisdom of a group of models to make better predictions than a single model could achieve alone.

To understand gradient boosting, let’s break down the core concepts:

Weak Learners: These are simple, relatively weak predictive models that individually may not perform well. A weak learner could simply be defined as a model doing slightly better than random guessing. In the context of gradient boosting, decision trees are commonly used as weak learners. An example would be a decision stump or a one-level decision tree (CART whose maximum depth is 1).

2. Ensemble Learning: Gradient boosting employs the concept of ensemble learning, which involves combining multiple weak learners to create a more accurate and robust model. Each weak learner focuses on the errors made by its predecessors and tries to improve upon them.

Ensemble Learning Method: A Visual Explanation

3. Gradient Decent using Loss Function: A loss function is used to measure the performance of the model. It quantifies the difference between the predicted values and the actual values. Popular loss functions include mean squared error (MSE) for regression problems and log loss (or cross-entropy loss) for classification problems.

Gradient descent is an optimisation algorithm that iteratively adjusts the model’s parameters to minimise a loss function. In gradient boosting, the loss function is minimised in each iteration to find the best possible model. By taking the gradients (slopes) of the loss function, the algorithm determines the direction and magnitude of parameter adjustments to reduce the errors.

The Gradient Boosting Process:

Gradient Boosted Trees for Regression: Training

There are a number of steps involved in gradient boosting:

1. Initialisation: The process begins by initialising the model with a simple weak learner. This learner makes initial predictions, which will have some errors.

2. Error Calculation: The errors are computed by comparing the initial predictions with the actual values. These errors serve as the basis for subsequent model improvements.

residuals = observed values — predicted values

3. Gradient Calculation: The gradient of the loss function is calculated with respect to the errors. The gradient provides information about the direction and magnitude of adjustments needed to minimise the loss.

4. Update Weak Learner: A new weak learner is created to focus on the errors made by the previous learner. This learner aims to reduce the errors by following the gradient.

5. Learning Rate: A learning rate parameter controls how much influence the new learner has on the overall model. It scales the updates made by the weak learner, preventing drastic changes and ensuring gradual improvement.

6. Ensemble Construction: The new weak learner is added to the ensemble of models, and its predictions are combined with those of the previous learners. The combined predictions become the updated predictions of the ensemble model.

7. Iteration: Steps 2 to 6 are repeated iteratively, with each subsequent weak learner targeting the remaining errors. The process continues until a predefined number of iterations is reached or the model’s performance converges to a satisfactory level.

Below is a code snippet for a gradient boosting regressor using scikit-learn, a Python library used to implement machine learning models and statistical modelling.

# Import models and utility functions
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error as MSE

# Set seed for reproducibility
SEED = 1

# Split dataset into 70% train and 30% test
X_train, X_test, y_train, y_test = train_test_split(X,y,
                                                    test_size=0.3,
                                                    random_state=SEED)

# Instantiate a GradientBoostingRegressor 'gb'
gb = GradientBoostingRegressor(n_estimators=300,
                                max_depth=1,
                                random_state=SEED)

# Fit 'gb' to the training set
gb.fit(X_train, y_train)

# Predict the test set labels
y_pred = gb.predict(X_test)

# Evaluate the test set MSE
mse_test = MSE(y_test, y_pred)

# Evaluate the test set RMSE
rmse_test = mse_test ** (1/2)

# Print the test set RMSE
print('Test set of RMSE of gb: {:.3f}'.format(rmse_test))

Some, not exhaustive, real-world applications of gradient boosting:

Gradient boosting can be applied to solve complex problems across multiple industries. In finance, a gradient boosting model can make predictions and support investment decisions to be used for stock market prediction, portfolio optimisation, and credit scoring through analysis of historical market data and financial indicators.

Another example is in customer relationship management, by optimising customer engagement strategies through analysis of customer data, such as demographics, past purchases, and online interactions. The model can be used to predict customer behaviour, segment customers, and personalise marketing campaigns.

Gradient boosting can also be applied to various Natural Language Processing (NLP) tasks, such as sentiment analysis, named entity recognition, text classification, and machine translation. It can do so by training from a large text corpora and making accurate predictions based on contextual information.

Within healthcare and biomedical research, gradient boosting has various applications such as disease diagnosis, patient outcome prediction, and drug discovery by training on medical records, genetic data, and biomedical literature. The algorithm can assist in making more accurate diagnoses and treatment plans.

Gradient Boosting at Haleon

In Haleon’s Data Science team, Gradient Boosting is used with a risk model that predicts the likelihood of an incident occurring in our internal manufacturing sites. Risk refers to the anticipated quality incidents over the next 6 months. The model outputs risk measures for each of the different risk categories, i.e., artwork and labelling, for each site, to aid design decisions and provision of potential mitigating actions. The result of the trained risk model indicates whether a site is at risk. The aim of this predictive risk forecast is to help the Quality Risk team identify any risk early, and set mitigating action plans to prevent the risk from materialising into incidents.

Gradient boosting is a powerful machine learning technique that combines weak models to create a strong predictive model. By iteratively correcting the mistakes of the previous models, gradient boosting achieves high accuracy and generalisation capabilities. While it has its limitations, proper tuning and optimisation can overcome most challenges. As a versatile tool, gradient boosting continues to make significant contributions to various domains, propelling the field of machine learning forward.

Enhancing Your Understanding of Gradient Boosting

Written by Florence Cesa