How to Improve the Accuracy of your Time Series Forecast by using Bootstrapping
Sometimes you would want more data to be available for your time series forecasting algorithm. In some cases, there might be a solution by bootstrapping your time series. In this post, I provide the appropriate Python code for bootstrapping time series and show an example of how bootstrapping time series can improve your prediction accuracy.
I assume some basic knowledge of:
- Statistics
- Time series analysis
- Exponential Smoothing
I provide additional resources in the text as refreshers. The notebook can be found here.
Bootstrapping
The statistical technique of bootstrapping is a well-known technique for sampling your data by randomly drawing elements from your data with replacement and concatenating them into a new data set. It has several applications, such as quantifying the uncertainty (= confidence intervals) associated with a particular moment/estimator. But it can also be used to provide additional data for forecasts. I do not want to give any further explanation of bootstrapping and refer you to StatsQuest where you can find a good visual explanation of bootstrapping.
I’m currently working on a forecasting task where I want to apply bootstrapping to simulate more data for my forecasting approach. I can’t share my exact approach, but I’ll explain it using monthly alcohol sales data and an ETS model. It’s based on the approach of Bergmeir et. al [1]. A good theoretical explanation of the method can be found here and here.
There is already a great post explaining bootstrapping time series with Python and the package tsmoothie. However, in this package, the data is decomposed before bootstrapping is applied to the series, using procedures that do not meet my requirements. Since there is no other good package to my best knowledge, I created a small script that can be used to bootstrap any time series with the desired preprocessing / decomposition approach.
When we bootstrapp time series, we need to consider the autocorrelation between lagged values of our time series. We cannot randomly draw data points from our dataset, as this would lead to inconsistent samples. Künsch [2] developed a so-called moving block bootstrap (MBB) method to solve this problem. In this method, the data are not drawn element by element, but rather block by block with equally sized blocks. This means, for example, that for 10 years of monthly data (= 120 data points), we randomly draw a block of n consecutive data points from the original series until the required / desired length of the new bootstrap series is reached. For annual data, a block size of 8 is common, and for monthly data, a block size of 24, i.e. 2 full years, is common. For weekday data (Monday-Friday), I personally use a block size of 20, which corresponds to 4 consecutive weeks.
Bootstrapping the original time series alone, however, does not produce the desired samples we need. We need to bootstrap the residuals of our time series and add them to the remaining part of our time series to obtain similar time series patterns as the original time series. For this approach, we use the seasonal and trend decomposition using Loess (STL) proposed by Cleveland et. al [3]. Read this if you need an explanation.
Summary of the procedure
My approach can be summarized as follows:
- Short data exploration
- Create a baseline model by applying an ETS(A,A,A) to the original data
- Apply the STL to the original time series to get seasonal, trend and residuals components of the time series
- Use the residuals to build a population matrix from which we draw randomly 20 samples / time series
- Aggregate each residuals series with trend and seasonal component to create a new time series set
- Compute 20 different forecasts, average it and compare it against our baseline model
Let’s go!
Practical application
First, let’s start with the data. I’m using monthly data of alcohol sales that I got from Kaggle.
The figure above illustrates the data. We observe an increasing trend and variance. It is clear that this series is non- stationary. ETS models can handle this. If you need a refresher on the ETS model, here you go.
Figure 2 illustrates the annual seasonality. We see relatively weak sales in January and July and relatively strong sales around May-June and December. Hence we use a seasonal parameter of 12 for the ETS model
The logarithm is used to smooth the (increasing) variance of the data. Remember to only ever apply the logarithm to the training data and not to the entire data set, as this will result in data leakage and therefore poor prediction accuracy.
Baseline model
In general, we want to predict the alcohol sales for each month of the last year of the data set. We use statsmodels to implement the ETS Model. We use the AIC, which should be minimized during the training period.
The model makes accurately predictions (MAPE: 3.01% & RMSE: 476.58).
Bootstrapping
The bootstrapping procedure is summarized as follow
- We apply STL to the original data and use the residuals to create the population matrix consisting of all possible blocks. For a series of length n (=312) with a block size of l (=24), there are n-l+1 possible blocks that overlap.
- From this matrix, we randomly draw the desired number of blocks and join them together. To ensure that any value from the original series can be placed anywhere in the bootstrapped series, we draw n/l + 2 (=15) blocks from the series where n/l is an integer division. Next, we discard a random number of values between zero and l-1 (=23) from the beginning of the series and discard as many values as necessary from the end of the series to get the required length of 312. In this way, we ensure that the bootstrapped series does not necessarily begin or end at a block boundary.
- We add the obtained trend and seasonality series to each bootstrapped series and get the desired number of bootstrapped series
Figure 4 illustrates the results. We can improve both the MAPE by about 7% from 3.01% to 2.80% and the RMSE by about 11.02%. Table 1 summarizes the results.
In summary, it is possible to improve prediction by bootstrapping the residuals of a time series, making predictions for each bootstrapped series, and taking the average.
Follow me if you would like to receive more interesting posts on forecasting methodology or operations research topics :)
References:
[1] Bergmeir C., Hyndman, R. J., Benítez J. M. (2016). Bagging exponential smoothing methods using STL decomposition and Box–Cox transformation. International Journal of Forecasting, 32(2), 303–312.
[2] Künsch, H. R. (1989). The Jackknife and the Bootstrap for General Stationary Observations. The Annals of Statistics, 17(3), 1217–1241.
[3] Cleveland, R. B., Cleveland, W. S., McRae, J. E., & Terpenning, I. J. (1990). STL: A seasonal-trend decomposition procedure based on loess. Journal of Official Statistics, 6(1), 3–33.