Timeseries Forecast Comparison– RNN vs Neural vs ARIMA vs Tbats

awanish kumar
5 min readAug 26, 2019

· Author Background: Awnish Kumar- A practicing Data Scientist with 13+ years of experience.

· Connect: @awanish.kumar53@gmail.com

· Topic: Timeseries Forecasting — ARIMA Vs Neural Vs Tbats Vs RNN

· Data: Based on daily sales data

· Result: Forecasting next 7 days sales numbers

Time series model is purely dependent on past behavior, trend and few noises, that can be used to predict future behavior.

Purpose:

Now a days Deep leering RNN is discussed everywhere and researcher are trying to test this in all sort of forecast/predictive problem. To experiment about the how RNN methodology is performing in Time series forecasting compare to other methodologies like ARIMA, NEURAL, Tbats.

Idea was that if RNN gives better accuracy percentage compare to existing forecast technique then I will be incorporating RNN in existing projects. But the results surprised me which i will be discussing in detail.

Brief of ARIMA Vs Neural Vs Tbats Vs RNN

Recurrent Neural Networks (LSTMs) — it can retain state from one iteration to the next by using their own output as input for the next step. In programming terms this is like running a fixed program with certain inputs and some internal variables.

It has three gates, have independent weights and biases, the network will learn how much of the past output to keep, how much of the current input to keep, and how much of the internal state to send out to the output.

Like

Step-1: Y0=X0

Step-2: Y1=X1 + Yo

Step-3: Y2=X2+ Y1

Step-4: Yt=Xt + Yt-1

Forecast value at time t is using current X value at t and forecast value at time t-1.

Neural Time series forecast:

A feed-forward neural network is fitted with a lagged values of time series data as vector inputs. It considers a single hidden layer with size nodes. This can use Number of seasonal lags used as inputs.

Its xreg is provided, its columns are also used as inputs. A total of repeats networks are fitted, each with random starting weights. These are then averaged when computing forecasts.

it is a nonlinear introgressive model, and it is not possible to analytically derive prediction intervals. Therefore, we use simulation.

ARIMA (AutoRegressive Integrated Moving Average):

A class of statistical models for analyzing and forecasting time series data. Three expect of arima.

AR: Autoregression. It creates relationship between an observation and some number of lagged observations.

I: Integrated. To make time series stationary. It is subtracting an observation from an observation at the previous time step.

MA: Moving Average. It creates relationship between an observation and a residual error from a moving average model applied to lagged observations.

Tbats:

It is capable of modeling time series with multiple seasonality (e.g. weekly and yearly seasonal patterns).

TBATS model takes it roots in exponential smoothing methods.

Tools used to:

RNN : Python, SciPy, Keras deep learning library

ARIMA/Neural/Tbats: R 3.5.2, Forecast and Seasonal packages

Time Series input dataset — sample data: (Transaction Date, Sales Number)

Why this research is conducted?

The trigger point to do this research is that, I am already using well tested forecasting model like ARIMA, ARIMAX, Neural, tbats etc at multiple projects within multiple domain. Can we replace / add new forecasting methodology which is buzz of hour in our, in existing project?

The idea was that if RNN timeseries forecast gives better accuracy compare to existing methodologies then we can replace the existing code. Our project already has way to calculate the accuracy. The formula/accuracy template sheet will be used for RNN also.

Research methodology:

During my research I used data from multiple domain like ATM daily cash demand, Daily Sales, daily power generation data etc. In this paper, I using Daily sales data to showcase the process and result.

Checking Input data for NULL value, exceptional high or low value is done. NULL value is replaced with mean of that day value. Exceptional high or low values are replaced with 95% and 5% percentile value.

Checking Trend, Seasonality and Noise and follow below steps:

Step 1 — Check stationarity and stationarize it.

Step 2 — create a validation sample (last 7 days numbers)

Step 3 — Build the model

Step 4 — Validate model: Compare the predicted values to the actuals in the validation sample.

Pic: Line chart plotting on input data

Running forecast model in tool R/Python (any tools can be used).

Validation model: Last 7 days data is used to check the accuracy percentage. So comparison of last 7 days actual and forecasted value will be compared to see what is the accuracy being given by each model. Day wise accuracy % is calculated. Higher the accuracy % better is the performance of model.

Validation formula:

· Accuracy % =1-ABS(Actual Sales number — Forecasted sales number )/ Actual Sales number

· **calculated on day wise.

· Actually, I am calculating deviation % of each model outcome wrt actual numbers.

· Accuracy % = 1- Deviation %

Accuracy Table and charts:

Accuracy Benchmark:

Conclusion:

1. Accuracy achieved by existing models like ARIMA, Neural and Tbats is much better than RNN LSTM forecast.

2. It is bit tricky to implement RNN LSTM compare to other traditional models.

3. The trade off negative. It means the amount of effort given to implement RNN is not equal to percentage of benefit we are getting.

4. It is wrong assumption that Deep Learning — RNN LSTM will give better result in all problems or business requirement.

Forecast Model and Code:

R brief reference code for ARIMA, Neural, Tbats.

packages <- c(“xts”, “Hmisc”,”lubridate” ,”seasonal”,”forecast”,”TTR”,”bsts”,”dplyr”,”Hmisc”,”compositions”)

pack(packages)

setwd(“C:\\book\\white paper — RNN time series”)

df<-read.csv(“sales_data.csv”)

df$tran_date<-dmy(df$tran_date)

days_t<-7

train<-head(df,nrow(df)-days_t)

test<-tail(df,days_t)

train_ts<-ts(train$sales)

##Model-1 — — Arima

arima_1<-auto.arima(train_ts,ic=’aicc’,stepwise = FALSE, approximation = FALSE)

arima1<-as.data.frame(forecast(arima_1,h=days_t))

##Model-2 — — TBATS

tbats_1<-tbats(train_ts,ic=’aicc’,seasonal.periods = c(7,365.25),use.parallel = F)

tbats1<-as.data.frame(forecast(tbats_1,h=days_t))

##Model-3 — — Neural

neural_1<-nnetar(train$sales,decay=0.5, maxit=150,repeats = 10)

neural1<-as.data.frame(forecast(neural_1,h=days_t))

##Preparation of test comparison sheet

test_result<-cbind(test,arima1$`Point Forecast`,neural1,tbats1$`Point Forecast`)

colnames(test_result)<-c(“tran_date” ,”sales” ,”arima”, “neural”, “tbats”)

aa<-test_result

for (i in 3:ncol(test_result)){

temp<-round(abs(1-abs(abs(test_result[,2]-test_result[,i])/test_result[,i])),digits = 2)

aa<-cbind(aa,temp)

}

Appendix:

https://robjhyndman.com/hyndsight/nnetar-prediction-intervals/

https://rdrr.io/cran/forecast/man/nnetar.html

https://kourentzes.com/forecasting/2017/02/10/forecasting-time-series-with-neural-networks-in-r/

https://machinelearningmastery.com/arima-for-time-series-forecasting-with-python/

https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/

--

--