Time Series Forecasting — ARIMA vs Prophet

Krish Hariharan
Analytics Vidhya
Published in
6 min readJan 14, 2020

What is Time series?

Time series is a series of data points indexed (or listed or graphed) in time order. Therefore, the data is organized by relatively deterministic timestamps, and may, compared to random sample data, contain additional information that we can extract.

Time series forecasting finds wide application in data analytics. These are only some of the conceivable predictions of future trends that might be useful:

· The number of servers that an online service will need next year.

· The demand for a grocery product at a supermarket on a given day.

· The tomorrow closing price of a trading financial asset.

For another example, we can make a prediction of some team’s performance and then use it as a baseline: first to set goals for the team, and then to measure the actual team performance relative to the baseline.

In this article we will try to forecast a time series data basically. We’ll build two different models in Python and inspect their results. Models we will use are ARIMA (Autoregressive Integrated Moving Average) and Facebook Prophet.

ARIMA (Autoregressive Integrated Moving Average)

ARIMA is a model which is used for predicting future trends on a time series data. It is model that form of regression analysis.

· AR (Autoregression): Model that shows a changing variable that regresses on its own lagged/prior values.

· I (Integrated): Differencing of raw observations to allow for the time series to become stationary

· MA (Moving average): Dependency between an observation and a residual error from a moving average model

For ARIMA models, a standard notation would be ARIMA with p, d, and q, where integer values substitute for the parameters to indicate the type of ARIMA model used.

· p: the number of lag observations in the model; also known as the lag order.

· d: the number of times that the raw observations are differenced; also known as the degree of differencing.

· q: the size of the moving average window; also known as the order of the moving average.

The forecasting equation is constructed as follows. First, let y denote the dth difference of Y, which means:

If d=0: yt = Yt

If d=1: yt = Yt — Yt-1

If d=2: yt = (Yt — Yt-1) — (Yt-1 — Yt-2) = Yt — 2Yt-1 + Yt-2

Note that the second difference of Y (the d=2 case) is not the difference from 2 periods ago. Rather, it is the first-difference-of-the-first difference, which is the discrete analog of a second derivative, i.e., the local acceleration of the series rather than its local trend.

In terms of y, the general forecasting equation is:

ŷt = μ + ϕ1 yt-1 +………+ ϕp yt-p — θ1et-1 -………- θqet-q,

where:

μ → constant

ϕ1 yt-1 +…+ ϕp yt-p → AR terms (lagged values of y)

-θ1et-1 -………- θqet-q → MA terms (lagged errors)

Prophet

Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.

· Seasonal effects caused by human behavior: weekly, monthly and yearly cycles, dips and peaks on public holidays.

· Changes in trend due to new products and market events.

· Outliers.

In its essence, Prophet library utilizes the additive regression model y(t) comprising the following components:

y(t)=g(t)+s(t)+h(t)+ϵt,

where:

· Trend g(t): models non-periodic changes.

· Seasonality s(t): represents periodic changes.

· Holidays component h(t): contributes information about holidays and events.

Dataset

We will predict the monthly production of beer in Australia.

First, we load our data-set and plot the data.

Basic Data Plot

When we look at the plot, we can say there is a seasonality in data. That’s why we will use SARIMA (Seasonal ARIMA) instead of ARIMA.

Seasonal ARIMA (SARIMA)

Seasonal ARIMA is an extension of ARIMA that explicitly supports univariate time series data with a seasonal component. It adds three new hyperparameters to specify the autoregression (AR), differencing (I) and moving average (MA) for the seasonal component of the series, as well as an additional parameter for the period of the seasonality.

There are four seasonal elements that are not part of ARIMA that must be configured; they are:
P: Seasonal autoregressive order.
D: Seasonal difference order.
Q: Seasonal moving average order.
m: The number of time steps for a single seasonal period.

Seasonal Decompose Plots

Seasonal Plot

ARIMA Forecast

Using auto_arima() function we get the best p, d, q, P, D, Q values. After splitting the data into train and test sets:

As we can see best ARIMA model chosen by auto_arima() is SARIMAX(2,1,1)x(4, 0, 3, 12).

Prediction

1994-09-01    134.018548
1994-10-01 157.615416
1994-11-01 181.934389
1994-12-01 183.656573
1995-01-01 144.670429
1995-02-01 136.950141
1995-03-01 151.194319
1995-04-01 133.265705
1995-05-01 138.106430
1995-06-01 120.552373
1995-07-01 128.309618
1995-08-01 138.919283
Freq: MS, Name: ARIMA Predictions, dtype: float64

Prediction Comparison

The blue line represents Monthly Production Data and the orange line represents ARIMA Predictions.

Model Evaluation

MSE Error: 64.99116627373826RMSE Error: 8.061709885237638Mean: 136.39537815126045

Prophet Forecast

The prediction is done based on the trend, seasonality and other additive terms used in the Prophet Model.

Prediction

Prediction Comparison

The blue line represents Monthly Production Data and the orange line represents Prophet Predictions.

Model Evaluation

MSE Error: 131.650946999156RMSE Error: 11.473924655459264Mean: 136.39537815126045

ARIMA vs Prophet Comparison

The blue line represents Monthly Production Data, orange dashed-dotted line represents ARIMA Predictions and the green dotted line represents Prophet Predictions.

Evaluation Comparison

Test Data vs Predictions

Summary

The objective of this article was to get the basic understanding of time series forecasting models such as ARIMA, Seasonal ARIMA and Prophet. From the experiment, we can see that SARIMAX model forecasting has better accuracy than the Prophet model forecasting. The RMSE for the SARIMAX model was around 8% while Prophet Model had RMSE of 11.4%.

The above tests are just quick and basic predictions so we can improve these models with tuning and according to our data and business knowledge.

Thanks!

GitHub Repository: https://github.com/krishvictor77/Time-Series-Forecasting-ARIMA-vs-Prophet

References: https://www.digitalocean.com/community/tutorials/a-guide-to-time-series-forecasting-with-prophet-in-python-3

https://www.kaggle.com/kashnitsky/topic-9-part-1-time-series-analysis-in-python#Econometric-approach

--

--