Time Series Forecasting using TBATS Model

Nadeem
Analytics Vidhya
Published in
9 min readNov 21, 2021

Multi seasonalities Time series forecasting model with python and R codes

Introduction

Time-series forecasting refers to the use of a model to predict future values based on previously observed values. Many researchers are familiar with time-series forecasting yet they struggle with specific types of time-series data. One such type of data is data with seasonality. There can be many types of seasonalities present (e.g., time of day, daily, weekly, monthly, yearly).

TBATS is a forecasting method to model time series data. The main aim of this is to forecast time series with complex seasonal patterns using exponential smoothing

TBATS: Trigonometric seasonality, Box-Cox transformation, ARMA errors, Trend and Seasonal components.

Overview

In order to start forecasting, we need to first install tbats package. The following steps should be implemented to create the model:

  • Partition the data into two parts(say, train_data and test_data). Train_data is used to train the model and fit the model to data. The trained model is evaluated using test_data.
  • Provide information about season lengths to the model (e.g., if hourly data is present, the model can be plotted weekly for all 24*7 hrs in a week).
  • Fit the model to train_data using train_data to the model.
  • Forecast the model ahead by a certain period of time for which you want to predict.

HOW TBATS CHOOSE FINAL MODEL

TBATS will consider various alternatives and fit quite a few models. It will consider models:

  • with Box-Cox transformation and without it Box-Cox transformation.
  • with considering Trend and without Trend.
  • with Trend Damping and without Trend Damping.
  • with ARIMA(p,q) and without ARMA(p,q) process used to model residuals.
  • non-seasonal model.
  • various amounts of harmonics used to model seasonal effects

The final model will be chosen using Akaike information criterion (AIC).

In particular, auto ARIMA is used to decide if residuals need modeling and what p and q values are suitable.

Existing Implementation

So far the only implementation has been available in R language, in the forecast package. Data Scientists using Python either had to resign from testing those models or were forced to use an R wrapper such as rpy2 to run them.

New Implementation

For a new implementation of TBATS in Python, available at GitHub. In the rest of the article, we will provide the example usage and compare the performance of this implementation with the other methods.

Installation

From PyPI:

pip install tbats

Import via:

from tbats import BATS, TBATS

Minimal working example:

from tbats import TBATS
import numpy as np
# required on windows for multi-processing,
# see https://docs.python.org/2/library/multiprocessing.html#windows
if __name__ == '__main__':
np.random.seed(2342)
t = np.array(range(0, 160))
y = 5 * np.sin(t * 2 * np.pi / 7) + 2 * np.cos(t * 2 * np.pi / 30.5) + \
((t / 20) ** 1.5 + np.random.normal(size=160) * t / 50) + 10

# Create estimator
estimator = TBATS(seasonal_periods=[14, 30.5])

# Fit model
fitted_model = estimator.fit(y)

# Forecast 14 steps ahead
y_forecasted = fitted_model.forecast(steps=14)

# Summarize fitted model
print(fitted_model.summary())

Reading model details

# Time series analysis
print(fitted_model.y_hat) # in sample prediction
print(fitted_model.resid) # in sample residuals
print(fitted_model.aic)
# Reading model parameters
print(fitted_model.params.alpha)
print(fitted_model.params.beta)
print(fitted_model.params.x0)
print(fitted_model.params.components.use_box_cox)
print(fitted_model.params.components.seasonal_harmonics)

Troubleshooting

BATS and TBATS try the multitude of models under the hood and may appear slow when fitting too long time series. In order to speed it up, you can start with constrained model search space. It is recommended to run it without Box-Cox transformation and ARMA errors modeling that are the slowest model elements:

# Create estimator
estimator = TBATS(
seasonal_periods=[14, 30.5],
use_arma_errors=False, # shall try only models without ARMA
use_box_cox=False # will not use Box-Cox
)
fitted_model = estimator.fit(y)

In some environment configurations parallel computation of models freezes. The reason for this is unclear yet. If the process appears to be stuck you can try running it on a single core:

estimator = TBATS(
seasonal_periods=[14, 30.5],
n_jobs=1
)
fitted_model = estimator.fit(y)

R forecast package comparison tests. Those DO NOT RUN with the default test command, you need the R forecast package installed:

python setup.py test_r

Comparison to R implementation

Python implementation is meant to be as much as possible equivalent to R implementation in the forecast package.

Some Examples and Time Series Dataset

In order to test forecasting methods, we need some time-series data. Let us use time series from Kaggle Store Item Demand Forecasting Challenge. It is a playground challenge and the set is most likely artificial (see comments in kernels and discussions). The data there contains daily sales of 50 items in 10 stores over a period of 5 years (500 different time series in total). For our purpose, we need only one time series so I will arbitrarily take sales of Item 1 at Store 1.

import pandas as pddf = pd.read_csv('kaggle_sales.csv')
df = df[(df['store'] == 1) & (df['item'] == 1)] # item 1 in store 1
df = df.set_index('date')
y = df['sales']y_to_train = y.iloc[:(len(y)-365)]
y_to_test = y.iloc[(len(y)-365):] # last year for testing

Fig 1: Daily sales of Item 1 at Store 1

Sales data contains daily observations. It exhibits weekly and yearly seasonal patterns. It means we are dealing with time series containing multiple seasonal effects. One of those seasonalities is long and contains 365 (366 for leap year) observations. This is something TBATS was designed for.

TBATS Model

In order to start forecasting, we need to install tbats package and fit the model. The only thing that we have to manually provide to the model are season lengths:

from tbats import TBATS, BATS# Fit the model
estimator = TBATS(seasonal_periods=(7, 365.25))
model = estimator.fit(y_to_train)# Forecast 365 days ahead
y_forecast = model.forecast(steps=365)

You may have noticed that the yearly season length is not an integer. It equals 365.25 to account for leap years, a feature TBATS is able to handle.

TBATS seems to have done a pretty good job in modeling both seasonal effects:

Fig 2: 3 last years of sales data. TBATS is modeling yearly seasonal effects.

Fig 3: 12 weeks of data. The weekly seasonal effect is also being modeled by TBATS.

If we take a look under the hood and review model parameters we will discover that 3 seasonal harmonics are used to model the weekly pattern and 11 harmonics are used to model the yearly pattern. TBATS has chosen to use Box-Cox transformation with lambda of 0.234955. The trend is not being modeled and ARMA is not used to model residuals as p, q is 0.

Use Box-Cox: True
Use trend: False
Use damped trend: False
Seasonal periods: [ 7. 365.25]
Seasonal harmonics [ 3 11]
ARMA errors (p, q): (0, 0)
Box-Cox Lambda 0.234955
Smoothing (Alpha): 0.015789

SARIMA Model With Weekly Seasonality

Let us compare TBATS to another method that is widely used and broadly known: SARIMA. SARIMA has proven to provide state-of-the-art solutions to time series forecasting. Unfortunately, it has two major drawbacks: (1) one can model only a single seasonal effect, (2) season length should not be too long.

Nevertheless, let's build the SARIMA model using auto_arima from pmdarima package. We shall ignore yearly seasonality and focus on modeling weekly seasonal patterns:

from pmdarima import auto_arima
arima_model = auto_arima(y_to_train, seasonal=True, m=7)
y_arima_forecast = arima_model.predict(n_periods=365)

Auto arima has chosen SARIMA(0, 1, 1)x(1, 0, 1, 7) model. As the expected yearly pattern is not modeled (see Fig 4).

Fig 4: SARIMA models only weekly patterns. Compare to Fig 2.

SARIMAX with Fourier Terms

One can apply a trick [4] to utilize exogenous variables in SARIMAX to model additional seasonalities with Fourier terms.

We will keep modeling the weekly pattern with the seasonal part of SARIMA. For the yearly seasonal pattern, we will use the above-mentioned trick. I have compared multiple choices for the number of Fourier terms and 2 provides the most accurate forecasts. Therefore we shall use 2 Fourier terms as exogenous variables.

# prepare Fourier terms
exog = pd.DataFrame({'date': y.index})
exog = exog.set_index(pd.PeriodIndex(exog['date'], freq='D'))
exog['sin365'] = np.sin(2 * np.pi * exog.index.dayofyear / 365.25)
exog['cos365'] = np.cos(2 * np.pi * exog.index.dayofyear / 365.25)
exog['sin365_2'] = np.sin(4 * np.pi * exog.index.dayofyear / 365.25)
exog['cos365_2'] = np.cos(4 * np.pi * exog.index.dayofyear / 365.25)
exog = exog.drop(columns=['date'])
exog_to_train = exog.iloc[:(len(y)-365)]
exog_to_test = exog.iloc[(len(y)-365):]# Fit model
arima_exog_model = auto_arima(y=y_to_train, exogenous=exog_to_train, seasonal=True, m=7)# Forecast
y_arima_exog_forecast = arima_exog_model.predict(n_periods=365, exogenous=exog_to_test)

With the help of Fourier terms, SARIMAX is able to model both seasonal patterns (Fig 5).

Fig 5: SARIMAX with Fourier terms models both weekly and yearly patterns

Model Comparison

So far we have been only looking at plots. Let us compare the models’ performance using the 365-days-ahead forecast. We will use Mean Absolute Error as our metric:

  • TBATS: 3.8577
  • SARIMA: 7.2249
  • SARIMAX with 2 Fourier Terms: 3.9045

As expected SARIMA provides a poor model as it is unable to model yearly seasonality. TBATS and SARIMAX with Fourier Terms provide much better models.

Advantages

Many time series exhibit complex and multiple seasonal patterns (e.g., hourly data that contains a daily pattern, weekly pattern, and an annual pattern). The most popular models (e.g. ARIMA and exponential smoothing) can only account for one seasonality.

TBATS model has the capability to deal with complex seasonalities (e.g., non-integer seasonality, non-nested seasonality, and large-period seasonality) with no seasonality constraints, making it possible to create detailed, long-term forecasts.

Drawbacks

Unfortunately BATS and TBATS capabilities do not come for free. The method is very generic. Under the hood, it builds and evaluates many model candidates. This results in the slowness of the computation. It may be crucial when one needs to train models for lots of parallel time series.

Unlike SARIMAX, BATS and TBATS do not allow for exogenous variables to be added to the model to improve forecasts. According to Rob J Hyndmann it is not trivial to include them and not fall into forecastability issues.

TBATS model (Exponential smoothing state-space model with Box-Cox transformation, ARMA errors, Trend and Seasonal components)

Fits a TBATS model applied to y, as described in De Livera, Hyndman & Snyder (2011). Parallel processing is used by default to speed up the computations.

tbats(
y,
use.box.cox = NULL,
use.trend = NULL,
use.damped.trend = NULL,
seasonal.periods = NULL,
use.arma.errors = TRUE,
use.parallel = length(y) > 1000,
num.cores = 2,
bc.lower = 0,
bc.upper = 1,
biasadj = FALSE,
model = NULL,
...
)

Conclusion

TBATS makes it easy for users to handle data with multiple seasonal patterns. This model is preferable when the seasonality changes over time.

References

  1. De Livera, A.M., Hyndman, R.J., & Snyder, R. D. (2011), Forecasting time series with complex seasonal patterns using exponential smoothing, Journal of the American Statistical Association, 106(496), 1513–1527. Working Paper version is available at https://robjhyndman.com/papers/ComplexSeasonality.pdf
  2. Python implementation of BATS and TBATS: https://github.com/intive-DataScience/tbats
  3. TBATS in R forecast package: https://www.rdocumentation.org/packages/forecast/versions/8.4/topics/tbats
  4. Forecasting Time Series Data with Multiple Seasonal Periods (Fourier Terms): https://content.pivotal.io/blog/forecasting-time-series-data-with-multiple-seasonal-periods

--

--