Grzegorz Skorupa
Jan 14 · 6 min read

There are two interesting time series forecasting methods called BATS and TBATS [1] that are capable of modeling time series with multiple seasonalities.

The names are acronyms for key features of the models: Trigonometric seasonality, Box-Cox transformation, ARMA errors, Trend and Seasonal components.

TBATS model takes it roots in exponential smoothing methods and can be described by the following equations:

Each seasonality is modeled by a trigonometric representation based on Fourier series. One major advantage of this approach is that it requires only 2 seed states regardless of the length of period. Another advantage is the ability to model seasonal effects of non-integer lengths. For example, given a series of daily observations, one can model leap years with a season of length 365.25.

BATS differs from TBATS only in the way it models seasonal effects. In BATS we have a more traditional approach where each seasonality is modeled by:

This implies that BATS can only model integer period lengths. Approach taken in BATS requires m_i seed states for season i, if this season is long the model may become intractable.

How Does TBATS Choose The Final Model

Under the hood TBATS will consider various alternatives and fit quite a few models. It will consider models:

  • with Box-Cox transformation and without it.
  • with and without Trend
  • with and without Trend Damping
  • with and without ARMA(p,q) process used to model residuals
  • non-seasonal model
  • various amounts of harmonics used to model seasonal effects

The final model will be chosen using Akaike information criterion (AIC).

In particular auto ARMA is used to decide if residuals need modeling and what p and q values are suitable.

For more details we invite you to the original paper [1].

Existing Implementation

So far the only implementation has been available in R language, in forecast package. Data Scientists using Python either had to resign from testing those models or were forced to use R wrapper such as rpy2 to run them.

New Implementation

We have created a new implementation of TBATS in Python, available at GitHub. In the rest of the article we will provide the example usage and compare the performance of this implementation with the other methods.

Time Series Dataset

In order to test forecasting methods we need some time series data. Let us use time series from Kaggle Store Item Demand Forecasting Challenge. It is a playground challenge and the set is most likely artificial (see comments in kernels and discussions). The data there contains daily sales of 50 items in 10 stores from a period of 5 years (500 different time series in total). For our purpose we need only one time series so I will arbitrarily take sales of Item 1 at Store 1.

import pandas as pddf = pd.read_csv('kaggle_sales.csv')
df = df[(df['store'] == 1) & (df['item'] == 1)] # item 1 in store 1
df = df.set_index('date')
y = df['sales']
y_to_train = y.iloc[:(len(y)-365)]
y_to_test = y.iloc[(len(y)-365):] # last year for testing
Fig 1: Daily sales of Item 1 at Store 1

Sales data contains daily observations. It exhibits weekly and yearly seasonal patterns. It means we are dealing with time series containing multiple seasonal effects. One of those seasonalities is long and contains 365 (366 for leap year) observations. This is something TBATS was designed for.

TBATS Model

In order to start forecasting we need to install tbats package and fit the model. The only thing that we have to manually provide to the model are season lengths:

from tbats import TBATS, BATS# Fit the model
estimator = TBATS(seasonal_periods=(7, 365.25))
model = estimator.fit(y_to_train)
# Forecast 365 days ahead
y_forecast = model.forecast(steps=365)

You may have noticed that yearly season length is not an integer. It equals 365.25 to account for leap years, a feature TBATS is able to handle.

TBATS seems to have done a pretty good job in modeling both of seasonal effects:

Fig 2: 3 last years of sales data. TBATS is modeling yearly seasonal effect.
Fig 3: 12 weeks of data. Weekly seasonal effect is also being modeled by TBATS.

If we take a look under the hood and review model parameters we will discover that 3 seasonal harmonics are used to model the weekly pattern and 11 harmonics are used to model the yearly pattern. TBATS has chosen to use Box-Cox transformation with lambda of 0.234955. Trend is not being modeled and ARMA is not used to model residuals as p, q are 0.

Use Box-Cox: True
Use trend: False
Use damped trend: False
Seasonal periods: [ 7. 365.25]
Seasonal harmonics [ 3 11]
ARMA errors (p, q): (0, 0)
Box-Cox Lambda 0.234955
Smoothing (Alpha): 0.015789

SARIMA Model With Weekly Seasonality

Let us compare TBATS to another method that is widely used and broadly known: SARIMA. SARIMA has proven to provide state of the art solutions to time series forecasting. Unfortunately it has two major drawbacks: (1) one can model only a single seasonal effect, (2) season length should not be too long.

Nevertheless lets build SARIMA model using auto_arima from pmdarima package. We shall ignore yearly seasonality and focus on modeling weekly seasonal pattern:

from pmdarima import auto_arima
arima_model = auto_arima(y_to_train, seasonal=True, m=7)
y_arima_forecast = arima_model.predict(n_periods=365)

Auto arima has chosen SARIMA(0, 1, 1)x(1, 0, 1, 7) model. As expected yearly pattern is not modeled (see Fig 4).

Fig 4: SARIMA models only weekly pattern. Compare to Fig 2.

SARIMAX with Fourier Terms

One can apply a trick [4] to utilize exogenous variables in SARIMAX to model additional seasonalities with Fourier terms.

We will keep modeling the weekly pattern with seasonal part of SARIMA. For the yearly seasonal pattern we will use the above-mentioned trick. I have compared multiple choices for the number of Fourier terms and 2 provides the most accurate forecasts. Therefore we shall use 2 Fourier terms as exogenous variables.

# prepare Fourier terms
exog = pd.DataFrame({'date': y.index})
exog = exog.set_index(pd.PeriodIndex(exog['date'], freq='D'))
exog['sin365'] = np.sin(2 * np.pi * exog.index.dayofyear / 365.25)
exog['cos365'] = np.cos(2 * np.pi * exog.index.dayofyear / 365.25)
exog['sin365_2'] = np.sin(4 * np.pi * exog.index.dayofyear / 365.25)
exog['cos365_2'] = np.cos(4 * np.pi * exog.index.dayofyear / 365.25)
exog = exog.drop(columns=['date'])
exog_to_train = exog.iloc[:(len(y)-365)]
exog_to_test = exog.iloc[(len(y)-365):]
# Fit model
arima_exog_model = auto_arima(y=y_to_train, exogenous=exog_to_train, seasonal=True, m=7)
# Forecast
y_arima_exog_forecast = arima_exog_model.predict(n_periods=365, exogenous=exog_to_test)

With the help of Fourier terms SARIMAX is able to model both seasonal patterns (Fig 5).

Fig 5: SARIMAX with Fourier terms models both weekly and yearly patterns

Model Comparison

So far we have been only looking at plots. Let us compare the models’ performance using the 365-days-ahead forecast. We will use Mean Absolute Error as our metric:

  • TBATS: 3.8577
  • SARIMA: 7.2249
  • SARIMAX with 2 Fourier Terms: 3.9045

As expected SARIMA provides a poor model as it is unable to model yearly seasonality. TBATS and SARIMAX with Fourier Terms provide much better models.

Drawbacks

Unfortunately BATS and TBATS capabilities do not come for free. The method is very generic. Under the hood it builds and evaluates many model candidates. This results in slowness of the computation. It may be crucial when one needs to train models for lots of parallel time series.

Unlike SARIMAX, BATS and TBATS do not allow for exogenous variables to be added to the model to improve forecasts. According to Rob J Hyndmann it is not trivial to include them and not fall into forecastability issues.

References

  1. De Livera, A.M., Hyndman, R.J., & Snyder, R. D. (2011), Forecasting time series with complex seasonal patterns using exponential smoothing, Journal of the American Statistical Association, 106(496), 1513–1527.
    Working Paper version is available at https://robjhyndman.com/papers/ComplexSeasonality.pdf
  2. Python implementation of BATS and TBATS: https://github.com/intive-DataScience/tbats
  3. TBATS in R forecast package: https://www.rdocumentation.org/packages/forecast/versions/8.4/topics/tbats
  4. Forecasting Time Series Data with Multiple Seasonal Periods (Fourier Terms): https://content.pivotal.io/blog/forecasting-time-series-data-with-multiple-seasonal-periods

intive Developers

At intive we’re building great digital products for our customers. Day by day. We want to share with you our way of doing things, the challenges we face, the tricks and shortcuts we discover. A little peek behind the scenes — welcome to our intive_dev blog!

Thanks to Ilya Nikokoshev

Grzegorz Skorupa

Written by

intive Developers

At intive we’re building great digital products for our customers. Day by day. We want to share with you our way of doing things, the challenges we face, the tricks and shortcuts we discover. A little peek behind the scenes — welcome to our intive_dev blog!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade