Practical nuances of Time Series Forecasting — Part I — ETS and Auto ARIMA | by Santosh_kumar

7 min readAug 16, 2023

Practical nuances of Time Series Forecasting — Part I — ETS and Auto ARIMA

In this series of blogs, I will explore different topics of Time Series Forecasting from a practitioner stand point & use a combination of theory & practical examples to drive the points. Please note that I don’t intend to cover these algorithms end to end , rather focus on few sub topics in each algorithm that are missed/misunderstood & emphasize on extracting maximum benefits from each algorithm from an implementation perspective. So, Let’s Begin!

Deep Learning(DL) Vs Statistical methods:

In the domain of Time Series forecasting, many Deep Learning(DL) algorithms have made their mark in recent times. Notable of them are Long Short Term Memory(aka LSTM — an RNN based model), Deep AR+ ( Amazon developed algorithm which combines DL with probabilistic forecasting),N-BEATS( a deep neural architecture based on fully connected layers) etc.

However, still in many practical applications in the current scenarios, simple statistical models are still relevant & suit the problems enterprises face, especially in areas of supply chain analytics. Few reasons why I say it is as follows:

Lack of External Data availability: Apart from demand data, most companies still don't have external factors data that impact their demand like promotions data or marketing campaigns data or weather data or any domain specific data that influences the demand. Once the data becomes univariate, statistical methods have a good chance of outperforming DL methods
Faster Run Times & less computational power: Statistical models are much faster than the DL models that need high run times to produce forecast & need expensive GPUs to do the calculations

Infact, in the most recent M5 competition(considered as gold standard in time series forecasting competitions) although all the top solutions were done using combination of ML & DL models, it is interesting to note that 92.5% of the solutions submitted could not beat ETS based benchmarks (Note that this competition had external data like promotions, price & special events related data)

So, even in the current day, a good understanding of statistical methods is imperative to generate good forecast outputs. Looking ahead five years, DL methods may dominate time series forecasting, but getting to that point requires navigating the challenges of present.

So, let’s begin with the first algorithm of the topic — ETS

ETS(Error Trend Seasonality) Models: Exponential family of models basically combine the Error, Trend & Seasonality components of time series data in multiple possible ways to generate a forecast. In seasonal multiplicative methods, the trended forecast is multiplied by the seasonality factor to produce final forecast where as in the additive methods, seasonality factor is added to the trended forecast to produce the final forecast. Common ways of combining trend & seasonality are as follows:

Some of the above mentioned combinations already are known by some other common names like (N,N) is Simple Exponential Smoothening, (A,N) is Holt’s Linear method, (A,A) is Additive Holt-Winters Method etc.

Now, you may ask why multiplicative trend methods are not mentioned. Multiplicative trend methods are extremely unstable & produce poor forecast results. Hence they are not included in the ETS implementation packages of R & Python.

Also, an important thing about dampening factor. It is advisable to dampen your forecast especially in case of long forecast horizons. This helps in avoiding too much optimistic/pessimistic forecasts. You should note that only trend is dampened, as seasonality dampening doesn't make any logical sense

For each of these above 9 models, we can use either additive or multiplicative error resulting in a total of 18 models. These 18 models are tested & used when we implement ETS in R/ Python.

One important concept that needs to be understood in ETS is when to use additive models & when to use multiplicative models.

Additive models work best when seasonal components are ~ constant and multiplicative models work best when seasonal components keep changing throughout the data series. A simple figurative demonstration will be as follows:

One important point thats needs to noted is that multiplicative methods work only when the data is strictly positive. So, if your demand data has zero values, multiplicative models dont work. One excellent workaround is replacing zeros with a very small number like .1/.01 etc. This will ensure that ETS algorithm is used effectively & best possible forecast can be obtained. Below is an example of how this work around can benefit in real time.

Let’s say we have a wholesaler who sells umbrellas & he has maintained historical sales data in quarterly format. He wants to forecast his future demand based on past demand.

Demand pattern looks like

Now, find the forecast can be generated using following code in R using both additive & multiplicative methods. As there are few zeros in dataset, if we dont apply above tweak , best ETS forecast we can get is that from ETS(Additive Error, Additive Dampend Trend & Additive Seasonality).

# R implementation of ETS_Additive & ETS_Multiplicative on Umbrella Quarterly
# sales data
library(forecast)
dataactual <- read.csv("D:/ETS_AAA_MAM_DEMO.csv")
dataactual$Qty[is.na(dataactual$Qty)] <- 0 

library(lubridate)
dataactual$date <- ymd(dataactual$Date)

dataactual <- dataactual[order(dataactual$date), ]
dataactual_test <- dataactual[1:22,]
dataactual_test

y <- ts(dataactual_test$Qty,frequency = 4)
y1 <- ts(dataactual$Qty,frequency = 4)

y1[y1 == 0] <- 0.1

mod.ETS_AAA.op <- ets(y, model="AAA",damped = TRUE)

summary(mod.ETS_AAA.op)
fcst.ETS_AAA.op <- forecast(mod.ETS_AAA.op, 24, level=c(80))
fcst.ETS_AAA.op.mean <- round(fcst.ETS_AAA.op$mean,2)

mod.ETS_MAM.op <- ets(y1, model="MAM",damped = TRUE)

summary(mod.ETS_MAM.op)
fcst.ETS_MAM.op <- forecast(mod.ETS_MAM.op, 24, level=c(80))
fcst.ETS_MAM.op.mean <- round(fcst.ETS_MAM.op$mean,2)


#In case if we didnt replace zeros & applied multiplicative models,
#we clearly get an error
mod.ets.test <- ets(y, model="MAM",damped = TRUE)

However, we replaced zeros by small values & result is shown for both additive & multiplicative models:

Clearly, Multiplicative model is outperforming additive model in capturing trend better as well as predicting demand in offseason quarters.

So, that’s a wrap of ETS Algorithm in the series. Let’s now move to ARIMA family, how it fundamentally differs from ETS.

ARIMA(Auto Regressive Integrated Moving Average):

Just like ETS, ARIMA family also has a multiple models like AR, MA, ARIMA, SARIMA, SARIMAX etc. Both ARIMA & ETS use different approaches to forecasting. While ETS uses trend & seasonality components in the data, ARIMA uses autocorrelations in the data to produce forecasts.

ARIMA family needs data to be stationary(constant mean(no trend), variance & covariance) in order to do the predictions. To make variance in the data constant, we need to use transformations — Log transformation, box-cox transformations etc.

Image illustrating the impact of transformations

To perform these kind of transformations, again we need strictly positive data. In python implementation of Auto ARIMA we need to do this transformations manually before using ARIMA, where as in R, we need to set the “lambda” hyper parameter as auto. Again the trick of replacing the zero values with a very small value does the job here.

Many people assume that whatever forecasts we can achieve through ARIMA can be achieved through ETS or vice versa. However, it is not correct. Some key differences between ETS & Auto ARIMA are as follows:

ARIMA models can handle cyclicity ( provided p(≥ 2) where as ETS can’t handle cyclicity)
We can include external factors like promotions etc. in ARIMA using SARIMAX, where as ETS can only do univariate analysis
However, Multiplicative ETS models don't have any ARIMA counter parts ie ETS can handle wide variety in this particular aspect
ETS can work even with very less data points but ARIMA models may require a minimum number of data points to forecast better( however there is no fix number of minimum data that is required for ARIMA. It depends on data set. A common number of 30 minimum data points that circulates on internet is incorrect)

There is a beautiful pie chart in the book Forecasting: Principles and Practice — Rob J Hyndman, George Athanasopoulos that shows the what is common between both the models & what is not.

So, hopefully you guys have a better understanding of time series forecasting models now. Please let me know your feedback in comments & stay tuned for the Part II! 😀

References:

Umbrella quarterly sales data set: https://drive.google.com/file/d/1tNn7TpDUQLGICZ7t6LMppaitNLeOmFsj/view?pli=1
Forecasting: Principles and Practice — Rob J Hyndman, George Athanasopoulos

https://otexts.com/fpp3/

Written by Santosh_kumar