A gentle introduction to Time Series

ARIMA and Prophet starter kit

Published in

Quantyca

8 min readJun 3, 2020

Time series analysis and forecasting are extremely useful and important tasks in statistical and machine learning environments. Despite this, they are often neglected and taken for granted , but they really deserve much more attention, since they can help, for instance, Economic Forecasting, Sales Forecasting, Budgetary Analysis and Stock Market Analysis.

Photo by Malvestida Magazine on Unsplash

Time series analysis requires experience and specialized skills to handle and get the right forecasting approach: sometimes, experts with knowledge in data analysis are not as much familiar with forecasting tasks. Time series have their own properties and their analysis should not be mistaken for a classical statistical inference. Moreover, forecasting model’s parameters are not always easily interpretable by analysts as they require a certain degree of statistical background and experience.

Goal of the article

This post aims to discuss the difference between a classical statistical time series model, ARIMA and a “newer” model, Prophet. The purpose of the article isn’t to define or judge which one is better, because they are both well-established and valid methods and their performance depends on the context and the target: it’s the analyst’s job to choose the best one. However, there’ll be underlighted the straightness points and the particularities of both and they will be tested with time series dataset “AirPassengers”.

Photo by Glenn Carstens-Peters on Unsplash

Basic concepts

Before getting to the heart of the matter, let’s introduce some basic concepts about time series and both methods.

📌 Time Series Definition

A Time Series is a sequence of observations, collected at regular intervals. Time Series can be a discrete stochastic process or a continuous stochastic process. When it comes to the first one, observations are taken in equispaced points in time, in continuous-time, instead, the observations are non-equispaced.

📌 Stationary Definition

Stationarity is a form of time-homogeneity of the data generating process and if time series is invariant with respect to time we can have:

strict stationarity when the entire probability distribution doesn’t vary with time shift

weak stationarity when the first two moments do not hinge on time:

The mean is constant over time. The variance is finite and does not change on time. The covariance is only a function of the temporal distance between the two random variables.

📌 PACF and ACF

Autocorrelation function and Partial Autocorrelation function are necessary to identify the right ARIMA model’s parameters. Both are measurements of the linear memory of a stationary process. They compute the correlation of an observation with lag values. The difference between the two methods is PACF considers the direct effects between the lags Xt-₂— Xt (red arrows). We ought to take into account a range of rules to interpret adequately the autocorrelation plots, they’ll be presented further.

Arima

The ARMA and ARIMA models were introduced in the 30s- 40s , they played a fundamental role in signal analysis during Second World War, but they started to be employed in economic analysis and in time series analysis in 1970 with Box and Jenkins’s book and even more with the arrival and the spread of personal computers, thanks to their high performance computing.

ARIMA stands for Auto-Regressive Integrated Moving Average, its features are:

AR → AutoRegressive model
The variable of interest is a linear regression of past values of the variable itself, which implies that the future relies on the past. The linear regression model is comprised of p lagged observations of the variable of interest plus a white noise, which captures everything that isn’t explained by the regression.
I→Integrated
It refers to differing methods, computing differences among consecutive observations, to obtain a stationary process from a non-stationary process. It’s defined by the parameter d, which outlines the number of times that observations are differenced.
MA →Moving Average model
It’s a regression -like model which makes recourse to past forecast errors to predict the variable of interest, plus a white noise. The moving average has order q, it defines the size of the moving average window.

Thus, the parameters to set are:

p, autoregression order
d, difference order
q, moving average order

To deal with time series with seasonal components, SARIMA (Seasonal Autoregressive Integrated Moving Average) is used. It’s an extension of the ARIMA model and adds seasonal terms into ARIMA. In this case the parameters to sets aren’t only p,d,q but also:

P, seasonal autoregressive order
D, seasonal difference order
Q, seasonal moving average order
m, seasonal period (annual period 12, quarter 4…)

But which is the criteria to choose the parameters?

Box and Jenkins pointed out a procedure to single out the best ARIMA model to fit a specific time series. Here below an example of the procedure summarized by a flowchart.

*Pelagatti M. (2016). Time Series Modelling with Unobserved Components. Taylor & Francis Group, Boca Raton.*

ARIMA models can approximate pretty well stationary processes, but time series aren’t always stationary. They present trend and seasonality, variance which depends on time and other non -stationarity problems. As a consequence, it’s necessary to apply some transformations as:

differences of 1st or 2nd order to remove trend
a further difference to remove seasonal effects
log transformation in case of non-stationary variance.

Once, a time series is weak stationary and it isn’t a white noise, using ACF and PACF, the best ARIMA model can be identified. In principle, AR(p) is identified when ACF tends to zero at a rate that is exponential, instead PACF is not null only for the first p lags (Fig. 1). On the other hand, MA(q) is identified when ACF is not null only for the first q lags; instead, PACF tends to zero at a rate that is exponential (Fig. 2).

Prophet

Prophet is developed by Facebook’s Core Data Science team and it’s an open-source tool for business forecasting. Prophet model is based on additive model (trend+seasonality+holidays), in detail:

📌 g(t) stands for trend. In prophet are implemented two types of trends:

Nonlinear, Saturating Growth
It’s a logistic equation useful to handle non-linear growth with saturation: the growth rate depends on the carrying capacity.
Piecewise Logistic Growth, which hasn’t growth saturation

The parameters to set are:

growth, linear or logistic trend
changepoints, list of dates determinate changing on growth rate (defined automatically by the model)
n_changepoints, list of changepoints defined by the analyst
changepoint_prior_scale, it changes the flexibility of the trend: increasing it will make the trend more flexible.

📌s(t) stands for seasonality. The time series can have multi-period seasonality, those are modelled by the Fourier series.

The parameter to define seasonality are:

yearly_seasonality
weekly_seasonality
daily_seasonality
seasonality_prior_scale, it changes the strength of seasonality model

📌h(t) stands for holidays and events. Prophets provides a list of past and future holidays and events. The model considers windows around those days to capture the effects.

The parameters to set are:

holidays, dataframe containing holidays and events date
holiday_prior_scale, It changes the strength of holiday model

AirPassengers Time Series

Below, the two methods will be tested with Airpassenger Time Series. Let’s load library and AirPassengers Time Series in R.

library(prophet)
library(forecast)data(AirPassengers)
plot(AirPassengers, ylab="Passengers", type="o", pch =20)

Arima

The AirPassengers is splitted into train and test set (the last two years).

df_train<- window(AirPassengers, end = c(1958, 12))
df_test <- window(AirPassengers, start = c(1959, 01))

To find the right ARIMA model, we follow the Box and Jenkins's procedure: the variance isn’t constant, because it increases and changes on time, thus log transformation is required. Furthermore, this time series isn’t stationary in mean, considering seasonality, thus the seasonality difference is necessary. Then, ACF and PACF are plotted.

ggtsdisplay(diff(log(AirPassengers), 12))

ACF and PACF suggest an auto regressive model of order 2, and an MA model of order 1. Thus ARIMA(2,0,0)(0,1,1) model is selected and is trained with the training set. Two parameters are defined: include.constant and lambda. The first one adds into the model the intercept. The other one, instead, defines log transformation.

arima_1 <- Arima(df_train, c(2, 0, 0), c(0, 1, 1), include.constant = TRUE, lambda = 0)

And ACF and PACF of model’s residuals are plotted.

ggtsdisplay(arima_m$residuals)

As we can see there isn’t a significant auto correlation among lags. The model can forecast the last two years.

arima_f <- forecast(arima_1, 24)
forecast(arima_1, 24) %>% autoplot()

The model is evaluated with RMSE, MAE and MAPE.

err = df_test - arima_f$mean
mape <- mean(abs(err) / ( arima_f$mean+ err)) * 100
rmse <- sqrt(mean(err^2, na.rm = TRUE)) 
mae <- mean(abs(err), na.rm = TRUE) 
cbind(mape, rmse, mae)

Prophet

As well as for ARIMA model, the dataset is splitted into train and test set. The dataset is defined by two columns: ds column represents the dates, instead, y column represents the values.

df$ds = as.Date(df$ds)
df_train = subset(df, ds < "1959-01-01")
df_test = subset(df, ds >= "1959-01-01")

As has been analyzed before, the seasonality isn’t constant in time, but it increases with trend. The additive models aren’t the best one to handle those time series. But with Prophet we can pass from additive seasonality to multiplicative seasonality through the parameter seasonality_mode.

m <- prophet(df_train,seasonality.mode = 'multiplicative')

With make_future_dataframe, we define the forecasting period and the frequency (monthly, weekly, annual).

future <- make_future_dataframe(m, 24, freq = 'm', include_history = F)
forecast <- predict(m, future)
plot(m, forecast)

As has been done before, the model is evaluated with RMSE, MAE and MAPE.

pred = forecast$yhat
err = df_test$y - forecast$yhat
mape <- mean(abs(err) / (pred+ err)) * 100
rmse <- sqrt(mean(err^2, na.rm = TRUE)) 
mae <- mean(abs(err), na.rm = TRUE) 
cbind(mape, rmse, mae)

Conclusion

The difference between the two methods, both theoretically and practically, has been presented. It seems clear that there is a substantial difference between the interpretability and the grade of knowledge required by the two methods: ARIMA method can be more problematic than Prophet method among non-expert in time series analysis or with no statistical background. Moreover, for experts in ARIMA models, it isn’t always straightforward to identify the right parameters. Furthermore, it is crystal clear that prophet has more tools to handle multi-period seasonality compared to SARIMA, making it useful for time series with different seasonality, such as midweek, quarter, annual seasonality.

References

Pelagatti M. (2016). Time Series Modelling with Unobserved Components. Taylor & Francis Group, Boca Raton.

Thank you for reading my post! I hope you’ve enjoyed it. For further contents visit my company website, or follow our linkedin page! ;)

A gentle introduction to Time Series

ARIMA and Prophet starter kit

Goal of the article

Basic concepts

Arima

Prophet

AirPassengers Time Series

Arima

Prophet

Conclusion

References

Written by Roberta Pollastro