Understanding ARIMA Forecasting

Etqad Khan
Jan 26 · 3 min read

When it comes to forecasting, ARIMA is quite often the first choice algorithm. Let us try to understand in brief what all this is about.

A simple intuition about Auto-Regressive Integrated Moving Average can be built upon the thought that this algorithm uses the past values of a time series alone to forecast the future values. ARIMA uses the lags and lagged forecast errors of a time series to forecast future values. A point to note, however, is that for ARIMA to work, the series should have non-seasonality.

Before we start with ARIMA, we should make sure that the predictors are independent of each other and aren't correlated.

ARIMA is composed of three parts,

AR : Auto-Regression; Here, the model uses the dependent relationship between a series and some lagged observations.

I : Integrated; The number of differences required to make the Time Series Stationary.

MA : Moving Average; Here, the model uses the dependency between a series and the residual error calculated by moving averages on the lagged observations.

The three important terms in ARIMA are:

p stands for the AR term

q stands for the MA term

d stands for the I term

So, we will start by making the series stationary. We will be subtracting the series with the subsequent previous terms. The order of differencing then is 1. For more complex series, more orders of differences are needed.

So, the value of d is the minimum number of differences needed to make the series stationary. For an already stationary series, the value of d=0.

p is the Auto Regressive term, it corresponds to the number of lags to be used as Predictors.

While q is the Moving Average term, which refers to the number of Lagged Forecast Errors needed to forecast the values. It is the size of the moving average window.

Let us understand p and q mathematically,

For a model to be pure Auto-Regressive model, the Yt completely depends on the lags of Yt,

Here, Yt-1 is the lag 1 of the series, β1 is the coefficient of the lag term and α is the intercept term.

In similar terms, a pure Moving Average model is where the Yt completely depends on the lagged forecast errors.

Here the error terms come from the Auto-Regressive models, so Et and Et-1 are the terms that come from the equations for Yt and Yt-1 (derived similarly to the first equation.)

If we sum up the ARIMA model through the understanding we build on error terms and autoregression, it comes at,

To put into words, it is,

Predicted Yt = Intercept + Lagged Values + Lagged Errors

This sets up the basic idea about ARIMA. Would be documenting the Code too.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store