XGBoost for Time Series Extrapolation: An Approach in Python

Dr. Sandeep Singh Sandha, PhD
2 min readJan 1, 2023

--

XGboost represents a very powerful class of classical models. However, if the time series has trends, XGBoost cannot extrapolate it. We generally don’t notice it in random splits of the dataset, but when a time-based split is done, this phenomenon is very clear.

Forecasting Problem Formulation: We will formulate the problem using sequence-to-sequence learning, wherein a model uses past input values to predict the future set of values. Further, the model parameters are fine-tuned using bayesian optimation (Mango) to search for the best possible model in each case.

Consider an example of a simple dataset shown below, when doing a random split (30% randomly selected samples), the model trains so well that we don’t realize the real problem. GitHub-notebook-1 to play

However, when a time-based split is done, we see the real problem. In this case, the last 30% of the data was used as the test samples (not used in training of the model), and we see the predicted future is not following the rising trend in the dataset. GitHub-notebook-2 to play

A detailed explanation of this problem with the XGboost mathematics is available for more curious readers, highlighting why it cannot extrapolate.

How can we solve this problem?

A simple approach that I will discuss here is to remove the trends in the dataset, then we can use the powerful XGboost again. There are several methods to remove the trend. A simple approach presented here is the difference method, where we approximate the trend using the rolling mean and subtract it. We also further transform the series to reduce the variations around the mean. GitHub-notebook-3 to play

Removing trend from the transformed series is visualized below:

Now, the predictions for the future of the transformed series are clean as expected. To get back the original series, we need to add the mean trend and revert the log operation, which is a simple task.

--

--