The I in ARIMA modelling and Random Walk time series

The I in ARIMA stands for “integrated”, and it has to do with the differencing in time series. This concept is often used for eliminating the trends in time series to make it stationary, and can be better illustrated with some examples of moving trends.


Recall that for a time series to be stationary, it needs to have a mean value function that is constant and thus time independent.

Assuming a time series x_t has:

  • a non-stationary linear trend component u_t and
  • a zero-mean stationary time series y_t

Applying the concept of differencing would result in the following:

If y_t is stationary, applying the differencing would result in delta_y, which is also a stationary time series. Rewriting delta_y as z_t and exploring its covariance function between h lags of time,

If y_t is stationary, its covariance function gamma_y would be independent of time. Thus, the covariance function of z (gamma_z) would also be independent of time, and only be dependent on the relative separations of time h.

Thus, in this case, by applying differencing, delta_x would be a stationary time series as it comprise of:

  • a mean b that is not dependent on time
  • a differenced stationary time series delta_y that remains stationary

This concept of differencing can be further applied to higher orders, and note the use of the backshift operator to represent the expressions.

Example of the backshift operator for lags 1 and 2
General Backshift Operator

Note that the expressions for higher orders of differencing can be expressed in polynomial equations:

Differences of order d are also represented in the following expression


Delving into an example of differencing, let’s look at an example of differencing on global temp data which can be found in R (gtemp).

plot(gtemp)
fit = lm(gtemp ~time(gtemp),na.action =NULL) # linear trend modelling
abline(fit, lty = 2) # plot linear trend component
Dotted line plot is based on the linear time trend (-11.2 + 0.005749 t)
summary(fit) # would reveal the following model fit of the linear modellin

# Residuals:
# Min 1Q Median 3Q Max
#-0.31946 -0.09722 0.00084 0.08245 0.29383
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) -1.120e+01 5.689e-01 -19.69 <2e-16 ***
#time(gtemp) 5.749e-03 2.925e-04 19.65 <2e-16 ***
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Residual standard error: 0.1251 on 128 degrees of freedom
#Multiple R-squared: 0.7511, Adjusted R-squared: 0.7492
#F-statistic: 386.3 on 1 and 128 DF, p-value: < 2.2e-16

The gradient of the linear component is 0.00549 increment in temperature per unit increment of time.

In R, the function required to perform differencing would be diff().

par(mfrow = c(2,1))
plot(resid(fit), main = "detrended")
plot(diff(gtemp), main = "first difference")
par(mfrow = c(3,1))
acf(gtemp, 50, main = "gtemp")
acf(resid(fit), 50, main = "detrended")
acf(diff(gtemp), 50, main = "first difference")

Observe that the ACF differences between the detrended series and the first difference series. While the detrended series shows a long middle cyclical autocorrelation , the first difference series appears to have minimal auto-correlation. This would imply that the series is similar (but not necessarily true) compared to a random walk with drift.


Random walk

A random walk can be expressed by the following:

The time series is purely predicted as a stochastic model with time dependency based entirely on the previous time point t-1. Note that a random walk time series is not stationary (as the AR polynomial root is not greater than 1). Applying first differencing would result in a white noise time series (which would be stationary), and would have minimal autocorrelation.

Random walk with drift

Building onto that point, a random walk with drift would indicate a linear time dependent component that changes with time. Assuming x_t has a linear time component u_t and a random walk time series y_t

As shown in the postulated model, the first differencing of a random walk with drift would result in a time series with:

  • a non-zero time-independent constant component mean (b) and
  • a stationary component (w_t).

Going back to the example, assuming the global temp is modelled after a random walk with drift, we can determine an estimate of the drift increment by finding the mean of diff(gtemp). This would result in a 0.0066 increment in temp per unit of time, although the standard error is large at 0.00966.

mean(diff(gtemp)) # [1] 0.006589147 (Drift)
sd(diff(gtemp))/sqrt(length(diff(gtemp))) # = 0.009658972 (SE)

There is no “correct” answer as to if one should pick the detrended model or the differenced model as it depends on what one would like to capture.

  • For the detrended model, it would allow one to model y_t by itself based on the remaining error residuals after eliminating the variance captured by the linear model.
  • For the differenced model, it would allow for rapid coercion of the time series into stationarity by eliminating trends.

The idea of differencing can be a powerful tool in coercing a time series into stationarity. Without knowing the exact model, scrutinising the ACF plot can be a good clue on when to apply differencing.

Note that the ACF plot of the original data shows high retention of autocorrelation with progressive lags of time, which indicate a slow residual decay. This is often known as long memory and is typically a good diagnostic indicator that differencing may be required.

There is a concept about fractional differencing that has to deal with 0<d<0.5, but that is beyond the scope of what I would cover.


In the next post, we will explore time series with time-dependent variance and how to utilise log transformations to coerce it into stationarity.