ARIMA

Advanced Time Series Methods: Auto Regression Integrated Moving Average.

Nadeem
Analytics Vidhya
5 min readAug 9, 2021

--

ARIMA stands for Auto Regression Integrated Moving Average.

ARIMA — Important Concepts

ACF-PACF and STATIONARITY

Auto-Correlation Function(ACF)

ACF: Correlation between the original data and the same data lagged by ‘h’ time period

The Correlation Coefficient is computed between two variables. Since we have only one variable here, we need to compute autocorrelation.

Let’s understand how the data points in this series are related to their immediately preceding data points.

There are seasonal and Non-Seasonal ARIMA models that can be used for forecasting.

Non-Seasonal ARIMA Model:

This method has three variables to account for P = Periods to lag for eg:(if P=3 then we will use the three previous periods of our time series in the autoregression portion of the calculation) P helps adjust the line that is being fitted to forecast the series

purely autoregressive models resemble a linear regression Where the predictive variables are P number of previous periods

D = In an ARIMA model we transform a time series into a stationary one(series without trend or seasonality) Using differencing. D refers to the number of differencing transformations required by the time series to get stationary.

Stationary time series is when the mean and variance are constant over time. It is easier to predict when the series is stationary.

Differencing is a method of transforming a non-stationary time series into a stationary one. This is an important step in preparing data to be used in an ARIMA model.

The first differencing value is the difference between the current time period and the previous time period. If these values fail to revolve around a constant mean and variance then we find the second difference using the values of the first differencing. We repeat this until we get a stationary series.

The best way to determine whether or not the series is sufficiently differenced is to Plot the differenced series and check to see if there is a constant mean and variance.

Q = This variable denotes the lag of the error component, where the error component is a part of the time series not explained by trend or seasonality

Autocorrelation Function Plot(ACF):

Autocorrelation refers to how correlated a time series is with its past values whereas the ACF is the plot used to see the correlation between the points, up to and including the lag unit. In ACF, the correlation coefficient is in the x-axis whereas the number of lags is shown in the y-axis.

The Autocorrelation function plot will let you know how the given series is correlated with itself.

Normally in an ARIMA model, we make use of either the AR or MA. We use both ARMA on rare occasions. We use the ACF plot to decide which one is these terms we should use for our time series.

If there is a Positive autocorrelation at lag 1, then we use the AR model

If there is a Negative autocorrelation at lag 1, then we use the MA model

After plotting the ACF plot we move to Partial Autocorrelation Function Plots (PACF). A partial autocorrelation is a summary of the relationship between an observation in a time series with observations at prior time steps with the relationships of intervening observations removed.

The partial autocorrelation at lag K is the correlation that results after removing the effects of any correlations due to the terms at shorter lags.

If the PACF plot drops off at lag n, then use an AR(n) model and if the drop in PACF is more gradual then we use the MA term.

Autoregression component: A purely AR Model forecast only using a combination of the past values sorta like linear regression where the number of AR terms used is directly proportional to the number of previous periods taken into consideration for the forecasting.

Use AR terms in the model when the:

  • ACF plots show autocorrelation decaying towards zero
  • PACF plot cuts off quickly towards zero
  • ACF of a stationary series shown positive at lag -1

Moving Averages: Random jumps in the time series plot whose effect is felt in two or more consecutive periods. These jumps represent the error calculated in our ARIMA model and represent what the MA component would lag for. A purely MA model would smooth out these sudden jumps like the exponential smoothing method.

Use MA terms in the model when the model is

  • Negatively Autocorrelated at Lag — 1
  • ACF that drops sharply after a few lags
  • PACF decreases more gradually

Integrated component: This component comes into action when the time series is not stationary. The number of times we have to difference the series to make it stationary is the parameter(i-term) for the integrated component.

We can represent our model as ARIMA( AR, I, MA)

Seasonal ARIMA (SARIMA) models:

As the name suggests, this model is used when the time series exhibits seasonality. This model is similar to ARIMA models, we just have to add in a few parameters to account for the seasons

We can write SARIMA as:

ARIMA(p,d,q)(P,D,Q)m.

  • P — the number of autoregressive
  • d — degree of differencing
  • q — the number of moving average terms
  • m — refers to the number of periods in each season
  • (P, D, Q) — represents the (p,d,q) for the seasonal part of the time series

Seasonal differencing takes into account the seasons and differences in the current value and it is value in the previous seasons.

  • In the Purely Seasonal AR model, ACF decays slowly while PACF cuts off to zero
  • AR models are used when seasonal auto-correlation is positive
  • In a purely seasonal MA model, ACF cuts off to Zero and Viceversa
  • MA models are used when seasonal auto-correlation is negative

Time Series Model Building Using ARIMA.

Final Steps:

  • Step 1 — Check Stationarity: If a time series has a trend or seasonality component, it must be made stationary before we can use ARIMA to Forecast.
  • Step 2 — Difference: If the time series is not stationary, it needs to be stationarized through differencing. Take the first difference, then check for stationarity. Take as many differences as it takes. Make sure you check seasonal differencing as well.
  • Step 3 — Filter out a validation sample: This will be used to validate how accurate our model, Use train test validation split to achieve.
  • Step 4 — Select AR and MA terms: Use the ACF and PACF to decide whether to include an AR term, MA term, (or) ARMA.
  • Step 5 — Build the model: Build the model and set the number of periods to forecast to N (depends on your needs).
  • Step 6 — Validate model: Compare the predicted values to the actuals in the validation sample.

--

--