A Practical Guide to ARIMA with auto.arima Function in R

13 min readNov 14, 2023

The auto.arima function provides a quick way to model a time series data that is believed to follow an ARMA (Autoregressive Moving Average)-class process. It allows not only ARMA-based model, but also ARIMA (Autoregressive Integrated Moving Average), SARIMA (Seasonal Autoregressive Integrated Moving Average), and those with exogenous predictors such as ARIMA-X.

This guide applies to univariate time series (i.e time series with only one observed variable as response). Future works may also discuss SARIMA and SARIMA-X models.

Basic Introduction

An ARMA (p, q) model consists of two parameters which define the order of each components. It is up to us to determine what values of p and q suit the data we own, which we will talk later.

The p term provides a way to connect current observation with p past observation (also known as lagged observation). This term is the AR part of the model and can be written as

For example, let’s say we have an AR(1) model. To find the current observation yₜ , we need to multiply yesterday observation yₜ₋₁ with a coefficient ϕ₁. The coefficient is determined such that it gives the best possible fit to our data (i.e the one closely matches our data).

In most cases, our AR model isn’t going to be perfect. It may predict the value of 23, where in fact, the true value is 22.4. The difference, 0.6, is ‘accounted’ by the error term εₜ, also known as observation error. We expect the errors to be noise (i.e it is random) that are close to zero.

The q term provides a way to take into account the errors made in the past. This term is the MA part of the model, which is written below.

The MA equation of an ARMA model. The current observation is a linear combination of the previous error terms.

This is similar to the AR model we discussed, where θ are the coefficients. Notice that we still keep the current observation error εₜ.

Adding the AR and MA model yields us the ARMA (p, q) model.

Some author may rewrite the MA terms to have negative coefficient, but the result should be equivalent. Some may also add a constant (or drift) term c to the model, essentially acting as an intercept in ordinary linear regression.

A key advantage of working with ARMA-based models is that it is easily interpretable. We can write down the equation of our model and get a better understanding of our data. This is contrast to many machine learning models that are considered ‘black-box’, often very difficult to be interpreted properly.

As a note, the description of error term above is incorrect. However, I used it to provide general intuition as to why explicitly write it in the model.

Stationarity

One key assumption in ARMA modelling is the stationarity of data. We are interested in checking for ‘wide-sense’ stationarity. In this case, the data should be stationary in both mean and variance.

Comparison of Stationary and Non-Stationary Process. Black line is the mean, the red line is the (estimated) standard deviation.

Visually, a data is stationary in mean if it lingers around the same value with no sign of trend. If the data slowly moves up or down (or both), it is wise to investigate further and do stationarity check.

A condition of heteroskedasticity is usually found on data that are non-stationary in variance. Visually, the ‘amplitude’ of the series changes over time. This condition leads us to another model such as ARCH-GARCH, which model the changes in variance explicitly.

The concept of stationarity is much more complex than explained here as it requires deep understanding of joint probability distribution and stochastic process. A data that seems stationary may not actually be stationary, and further, the stationarity test may also be misleading. However, it is still a very useful (and practical) tool.

From ARMA to ARIMA

ARIMA simply handles non-stationary data. It is done by the third term d, which allows differencing and is completely optional. Differencing is defined as the difference between the current data and previous data. Depending on the data, we may wish to take more than just one difference until stationarity is satisfied.

There are many other ways to deal with non-stationarity. Transforming the series with log or square root may help with exponentially growing data. Taking the log difference (often called log-return) could also help. Decomposing the series into several components and modelling only the residual is also a popular way to go.

The ARIMA model is a bit harder to write as it requires us to take into account the d order.

We introduce the backshift B operator, which we can treat it as a constant, but it has special properties.

Backshift operator

The Box-Jenkins Approach

Box & Jenkins developed several steps to ARIMA modelling

Identification
We shall identify the initial order of p, d, and q that is considered adequate for our data. This is done by looking at the autocorrelation and partial autocorrelation plot.
Estimation
We estimate the model parameters (or coefficient) by several methods. The estimated parameters should be the best one for our data. There are rare cases where model coefficients cannot be estimated due to the limitation of the approach. These are often caused by non-converging iteration and we should avoid using these ‘incomplete’ models.
Diagnostic Check and Verification
We should check whether our model is capable of explaining our data and provides a good ‘fit’. This is usually done by looking at the behaviour of the residuals (i.e the difference between predicted values and the observed values).
In any case that we are not satisfied with the current model, we may wish to fit a different ARIMA model (i.e choosing different p or q, often lower values). Lower p and q order results in simpler model, which is always appreciated.
In this article, we will skip this part and choose not to do any diagnostic checking. If possible, readers should put more attention on diagnostic check to ensure proper ARIMA model. Several methods such as Ljung-Box test on squared residual and Portmanteau test may be helpful.
Forecasting
We use the existing model to forecast future values.

The 2nd step is done automatically by auto.arima function.

Theoretically, assuming that all model assumptions are respected, ARIMA should provide the most optimal forecast. However, in practice (as stated by Stellwagen & Tashman, 2013), other models often perform better than ARIMA.

The Stationarity Test (Augmented Dickey-Fuller Test)

The test is named Augmented Dickey-Fuller (commonly abbreviated ADF-test). The null hypothesis states non-stationary series, or to be more specific, a unit root exists. The test statistic is a negative number, with lower number indicating a stronger result towards stationary data.

The Autocorrelation and Partial Autocorrelation Functions (ACF-PACF)

Autocorrelation shows the degree of connection between an observation at time t and t-k. Like correlation, its value ranges from -1 to 1. A value of 1 (-1) indicates a strong positive (negative) connection. A value of close to 0 indicates weak connection between the two observation.

The partial autocorrelation is similar, but does not incorporate existing correlation with t-1, t-2, …, t-k+1.

The ACF and PACF plot (also known as correlogram) provides many information regarding our data. In our practice, we are interested to see if the ACF-PACF shows a distinct and well-known pattern. There are many patterns that can be used to distinguish a certain process, such as AR, MA, or ARMA.

An example of AR(3) process. The ACF decays slowly over time, whereas the PACF cuts off after the 3rd lag.

The example above shows an AR(3) process. The ACF decays very slowly over time, whereas the PACF cuts off (i.e drops significantly) after the 3rd lag. The order p is determined by the last ‘significant’ lag of the PACF.

An example of MA(1) process. Opposite of the AR process, the PACF decays over time, whereas the ACF cuts off after the 1st lag.

Contrast to the AR(p) process, the MA(q) process is indicated by a decaying PACF and the ACF cuts off. The order q is determined by the last ‘significant’ lag of the ACF.

An example of ARMA(2,2) process. Both ACF and PACF cuts off.

An ARMA(p,q) process may be identified by looking at the last ‘significant’ lag of the PACF (p) and ACF (q). In the example above, the ACF cuts off after the 2nd lag, suggesting q = 2. The same goes to PACF, suggesting p = 2.

How auto.arima Works

auto.arima is a wrapper. It loops through all possible ARIMA models and find the best model with the lowest AIC. AIC (Akaike Info Criterion) is a popular metric to measure the model fit (i.e how good it is at predicting our data), where lower values correspond to better fit.

ARIMA fitting process usually don’t take very long as it is already heavily optimized with abundant theoretical breakdown. However, it is good practice to keep our model as simple as possible to ensure better generality and avoid overfitting. Overfitting (or overspecification) happens when our model is good at predicting existing data, but does a horrible job at predicting the future. It may lead to weird prediction intervals, which is not welcome. However, overfitting may also be used to test for adequacy, a topic which will be discussed later.

We shall not discuss how to actually fit and ARIMA model. Most softwares rely on the concept of OLS, maximizing the likelihood, or state-space estimation, all of which are complicated.

It’s ARIMA Time!

We will be using WWWusage dataset, which is provided in the datasets package. It represents the number of user connected to the internet through a server every minute, recorded over 100 minutes.

# Plotting the dataset
plot(WWWusage, type = 'l', ylab = 'Num. of Users', xlab = 'Minute', lwd = 2,
     main = 'Internet Usage per Minute')

We observe a franctic pattern, which goes up and down irregularly. However, there does not seems to be any seasonal pattern, which would have suggested a SARIMA model. The data also suggest a non-stationary pattern in the mean, which should be treated with differencing.

# ADF (stationary) test
adf.test(WWWusage)

Augmented Dickey-Fuller Test on WWWusage data

Unsurprisingly, the ADF test returns a p-value of 0.31. Recall that the null hypothesis is non-stationarity. Hence, we do not have sufficient evidence for stationary series.

1st and 2nd Order Differencing

The differencing is done by diff function provided in base R. The argument we are interested in is differences. It determines the difference order.

# Compute 1st and 2nd order difference
y_diff = diff(WWWusage, differences = 1)
y_diff_2 = diff(WWWusage, differences = 2)

# Run ADF test
adf.test(y_diff)
adf.test(y_diff_2)

Augmented Dickey-Fuller Test on differenced WWWusage data

Doing 1st order differencing does not seems to help with stationarity, however, doing it the 2nd time does! The p-value of 0.01 indeed shows a rejection towards non-stationarity. We have determined our ARIMA model as (p, 2, q).

If you are curious, here is the plot of the differenced series. The 1st differenced data seems to still have that up and down trend, which is eliminated by doing 2nd order differencing. Additionally, the mean of the differenced series seems to be close to zero, which does not necessitate a constant (drift) term.

Identifying the Initial ARIMA Order

We have found d = 2, now we observe its ACF and PACF plots to see the p and q orders.

The ACF and PACF of WWWusage differenced data.

The ACF and PACF is consistent with an ARMA process! We notice that the ACF cuts off after the 2nd lag, suggesting q = 2. The PACF cuts off after the 2nd lag, too, suggesting p = 2.

Our starting model is ARIMA (2, 2, 2).

Model Fitting with auto.arima

The p = 2 and q = 2 is the maximum order. auto.arima will search over all simpler model (while keeping d = 2). This includes ARIMA (2,2,1), ARIMA (2,2,0), ARIMA (1,2,2), and so on.

There are five main arguments that are of interest:

y : the series to be modeled, preferably the original (undifferenced) series
d : the difference order
max.p : the maximum order of p allowed
max.q : the maximum order of q allowed
seasonal : include seasonal component, in this case, we set it to FALSE

# Run auto.arima with a set d
require(tseries)
require(fpp2)
arima_m = auto.arima(WWWusage, d = 2, max.p = 2, max.q = 3, seasonal = F)
summary(arima_m)

Summary of the ‘best’ ARIMA model (with lowest AIC)

The summary table shows ARIMA (2,2,0) to be the best in terms of AIC. The model does not include any MA terms and intercept term.

Here we see several important results, mainly the estimated coefficient and their respected standard error (i.e smaller error means more accurate estimate). Our ARIMA model can be written as

The sigma² indicates the estimated variance of the error term. Alongside that we have the log likelihood, which is then used to compute AIC, corrected AIC, and Bayesian Info Criterion. There are also several prediction error measures that may be of interests to some readers. We may also wish to split the data into train-test sets and evaluate the model directly, a topic which we shall not discuss here.

ARIMA models with d > 1 typically don’t require any form of constant (or drift term) since the data is already centered around 0. The default auto.arima and arima function in R automatically decide whether adding constant is necessary or not.

Forcibly Add Constant (Drift) Term

In certain cases when we absolutely need to include constant, we shall use a different method. The Arima (not arima) function provided in forecast package allows us to explicitly add a constant term. However, for d > 2, the constant will be dropped regardless due to complexity. The following is an example of fitting ARIMA (2,1,2) with constant.

model = Arima(WWWusage, order = c(2,1,2), include.constant = TRUE)

We observe that the AIC is higher than that of ARIMA (2,2,0), a slightly worse fit, possibly because at d = 1, our data is not stationary.

(Simple) Diagnostic Check

Diagnostic checking is always appreciated in practice as it is very easy to do. It ensures that we did not violate major assumptions. We shall talk about three key checks, which are normality, autocorrelation, and homoskedasticity of residual.

Residual Normality

We will begin by taking a look at the studentized residual vs. time plot. Any of the following function may be used.

# General check
checkresiduals(arima_m)

Studentized residuals, ACF, and histogram

The top plot shows the studentized residuals of the model. We must be wary of any trend pattern, repeating cycle, or a change in amplitude, as these may indicate that the model is inadequate. In our case, the residuals are spread around zero with no specific pattern, indicating adequate fit.

The normality of residuals is also of interest since many inferences assumes normality. To see this, we may see the histogram shown above. If the shape roughly follows the bell curve, the residual is likely to be normally distributed.

# Normality of residual with Jarque-Bera
jarque.bera.test(arima_m$residuals)

We may also conduct a Jarque-Bera test for the residuals. The null hypothesis is normal residuals. We observe that the p-value is 0.9626, insufficient evidence to reject our null hypothesis. The residual is therefore Normal.

As a note, residuals are often assumed to be white noise (not necessarily Normal) with zero mean.

Residual Autocorrelation

Residuals are supposed to be independent from one another. That is, they are uncorrelated, its value at time t shall not influence the value at time t+k.

In practice, some degree of autocorrelation may persists, especially on lower lags, even though the model is adequate. However, the autocorrelation should be low and insignificant. This is easily checked by running Ljung-Box test, with the null hypothesis being small residual autocorrelations.

# Ljung-Box for Residual Autocorrelation
tsdiag(arima_m)

Ljung-Box test for residual autocorrelation

From the ACF of residuals, we observe that there doesn’t seem to be any significant autocorrelation. This is supported by the Ljung-Box test, which fails to reject the null hypothesis. The residual is therefore uncorrelated.

Residual Heteroskedasticity

The variance (or covariance) of residual should remain constant at all time. We may visually assess this by looking at the squared residuals ACF and PACF. If there are significant lags (especially early on), this may suggest us to use GARCH model.

# Heteroskedasticity check by looking at ACF-PACF of squared residuals
acf(arima_m$residuals^2)
pacf(arima_m$residuals^2)

The result above shows that squared residuals do not possess any meaningful autocorrelation. We may safely conclude that the residuals are homoskedastic.

Forecast

Forecasting is very easy to do. Simply provide the ARIMA model and the forecast horizon (i.e the length of time to be forecast) to the forecast function.

# Forecasting
my_forecast = forecast(arima_m, h = 10)
autoplot(my_forecast)

ARIMA (2,2,0) forecast of WWWusage data, 10 step ahead

We observe the continuation of negative trend for the next 10 period. The dark blue are shows the 80% confidence interval, whereas the light blue shows the 95% confidence interval.

To obtain the predicted values, we may simply run my_forecast$mean.

Epilogue

ARIMA models tend to provide flat forecast because it is usually a mean-based forecast (i.e only predicting the mean). While this may not seem interesting (as there isn’t any pattern to look out for), the forecast also comes with the confidence interval, which grows larger as we forecast longer into the future. Thus, the true values have a good change of staying inside this interval.

A popular way to check for fit adequacy is by fitting a slightly more general model. For example, we have ARIMA (2,2,0) in this case, and so we may wish to fit ARIMA (2,2,1) or ARIMA (3,2,0) as a comparison. If the additional MA parameter is not significant or if the coefficient estimates are not that different from the previous model, this may suggest that the current ARIMA (2,2,0) is adequate.

Kindly note that I have skipped many important aspect of ARIMA modelling. Any suggestions or discussions are always appreciated.