A Thorough Introduction To ARIMA Models

Published in

Analytics Vidhya

7 min readJan 20, 2021

ARIMA models and its variants are some of the most established models for time series forecasting. This article will be a somewhat thorough introduction to ARIMA/ARMA modelling, as well as the math behind how they work.

ARIMA MODELLING

The ARIMA (Auto Regressive Moving Average) model is a very common time series-forecasting model. It is a more sophisticated extension of the simpler ARMA (Auto Regressive Moving Average) model, which in itself is just a merger of two even simpler components:

AR (Auto Regressive): models attempt to predict future values based on past values. AR models require the time series to be stationary.
MA (Moving Average): models attempt to predict future values based on past forecasting errors. MA models assume that an autoregressive model can approximate the given series. This is not to be confused with moving average, which is a smoothing process rather than a forecasting model.

AR MODEL

We can begin to define an AR model as:

The AR model however takes an order, p, which will dictate how many prior time steps to use in the regression.

An AR(p) model can be expressed as:

This can be finally represented as:

Therefore:

The coefficients are to be estimated.

MA MODEL

We can begin to define an MA model as:

Thus, in contrast to an AR model, an MA model is a linear regression of the current value of the series against previous observed white noise error terms.

Similarly to AR models, MA models also take an order term, q, which will dictate how many prior errors will be considered.

An MA(q) model can be expressed as:

This can be finalized to:

Therefore:

The coefficients are to be estimated.

ARMA MODEL

An ARMA model is simply a merger of the two previously described AR and MA models. Recalling their definitions, we can therefore express an ARMA model as:

p and q are the orders of the AR and MA models, respectively.

ARIMA MODEL

The ARIMA (Auto Regressive Integrated Moving Average) model is an extension of the ARMA model, with the addition of an integration component.

ARMA models must work on stationary time series. A stationary time series is a series that’s statistical properties, such as mean and variance, do not change over time. Unfortunately, majority of real world time series are not stationary, and thus they must often be transformed in order to make them stationary. The process of transformation is referred to as integration.

The transformation process employed is called differencing, where we take a given d-th difference of the series until the series is stationary. We can see this in the image directly below, where a non-stationary time series was transformed by taking the second differences of the series:

The ARIMA Model can therefore finally be expressed with the notation:

ARIMA PARAMETER ESTIMATION

Although the ARMA and ARIMA models are relatively simplistic, proper parameter estimation is required to assure they function properly. We need to be able to estimate the Φ and θ parameters, as well as find optimal orders p and q for the model.

ORDER SELECTION WITH ACF & PACF

Arguably the most common method for identifying the proper orders of an ARMA/ARIMA model is with using the ACF (Auto Correlation Function) and the PACF (Partial Auto Correlation Function).

ACF: The Autocorrelation Function computes the autocorrelations for a given time series. Autocorrelation is the correlation between observations of a time series separated by k time steps.
PACF: The Partial Autocorrelation Function computes the strength of relationship with also accounting for any intermediate lags.

Both ACF and PACF produce plots for up to any arbitrary amount of lags, which easily visualizes the auto correlational strength that each lag has on a given observation.

Below is an example of ACF and PACF plots for a time series:

The above plots show the autocorrelation strengths for given lags in the time series, with the lag on the x-axis and the respective autocorrelation on the y-axis. The shaded regions are 5% statistical significance margins.

By looking at the spikes, we can see which previous lagged points in the time series is most correlated to a present observation. Recalling that AR models use previous observations to predict a future value, we can conclude that we should therefore look at the previous points that are shown to correlate the most to a current observation. It is important to note that the amount of differencing d term is not chosen in this way.

There are additionally a few common rules of thumbs we can employ:

If the PACF plot displays a sharp cut off and/or the lag-1 autocorrelation is positive, then consider adding an AR term to the model. The lag at which the PACF cuts off is the indicated number of AR terms.
If the ACF plot displays a sharp cut off and/or the lag-1 autocorrelation is negative, consider adding an MA term to the model. The lag at which the ACF cuts off is the indicated number of MA terms.

As is shown in the example above, there is a very sharp and powerful positive spike at lag-1, indicating that (for this series) an observation at a given time step is highly correlated to the time step directly before it. There is a sharp cut off, and essentially no other lags are correlated in any significant manner. Therefore, we can infer that we should at least use an AR(1) model.

As for the ACF, in this example it too has a sharp powerful positive spike at lag-1 with no other significantly correlated lags. Therefore we will give the MA model an order of 1: MA(1).

Notice the very first large spike on both of the plots. This is never to be taken into account, because this is the correlation of an observation at lag-0, which is no lag at all. This is of course the observation itself, which obviously will be completely correlated with itself.

In summary, we can use the plots above, with a few rules, to attempt to identify the proper orders for the AR and MA processes.

ORDER SELECTION WITH INFORMATION CRITERIA

Using Information Criteria is another method for identifying optimal orders for an ARMA/ARIMA model on a given time series. Typically, this involves Akaike’s Information Criterion (AIC) and the Bayesian Information Criterion (BIC).

The AIC can be expressed as:

The proper orders can be found by minimizing the AIC.

The BIC can be expressed as:

Similarly to the AIC, the proper orders can also be found by minimizing the BIC.

The AIC and BIC alone is not typically used to find the model parameters. It is much more common to use such criterion to select the more optimal model out of an already existing collection of candidate models.

MODEL PARAMETER ESTIMATION

The most important aspect of an ARMA/ARIMA model is no doubt the estimation of the Φ and θ coefficients. The actual in depth review of these algorithms gets relatively complex, and is beyond the scope of this article. However, extra explanation on some of the most commonly used algorithms for such parameter estimation can be seen below: