Seasonal lags: SARIMA modelling and forecasting

Kenneth Foo
5 min readJan 3, 2018

--

A seasonal autoregressive integrated moving average (SARIMA) model is one step different from an ARIMA model based on the concept of seasonal trends. In many time series data, frequent seasonal effects come into play. Take for example the average temperature measured in a location with four seasons. There will be a seasonal effect on a yearly basis, and the temperature in this particular season will definitely have a strong correlation with the temperature measured last year in the same season.

library(astsa)
plot(tempr) # LA temperature measured from 1970 to 1980
Yearly cycles observed with temperature data

The plot above shows the yearly cyclical rise and fall in temperatures, and there is a strong basis for assuming temperatures will fall to 60 deg F near the end of the year, while temperatures will surge past 80 deg F near the middle of the year.

Another example of such seasonality cycles can be seen in the US Monthly Birth Rates (dataset is titled “birth” under the ASTSA package).

plot(birth) # US Monthly birth rates from 1948 to 1979
abline(v = ts(c(1950,1955,1960,1965,1970,1975)),col = "red") # v is for vertical lines
abline(v = ts(c(1951,1952,1953,1954)), col = "blue", lty = 2) # lty is for line type with 2 representing dotted lines

With the red and blue lines for visual reference, we can easily detect the yearly peaks and troughs within each year, even though there seems to be some “bigger-picture” cyclical trend between 1948 to 1979.

For an expression of a SARIMA model, we can take a look at the following 1st order SARMA (note the missing I) expression:

This implies that the model exhibits autocorrelation at past lags of multiple of 12 (which would be defined as the seasonal period) for both autoregression and moving average components. Thus, this model would be of SARMA (P,Q), where P and Q =1.

Multiplicative SARMA model

Combining both the seasonal and non-seasonal ARMA models would be simply expressed as ARMA(p,q) x (P,Q)s, where P,Q,s represent the components of the seasonal model.

Seasonal operators are shown as related to seasonal parameters P, Q, s

Seasonal Differencing

Some time series may depict ACF that decays slowly at the lags of the multiples of the seasonal period.

The average monthly temperature can be modelled as the following:

S_t represents the seasonal component that varies from one year to the next, and represents a random walk expression. Note that w_t and v_t are uncorrelated white noise processes. Applying a seasonal difference to the time series expression will result in a MA(1) with an ACF peak at lag = 12.

Thus, the multiplicative SARIMA model can be expressed in an expression similar to the ARIMA model as shown in the following:

Seasonal operators related to seasonal parameters P, Q, s

Under the ASTSA package, there is a dataset called prodn that is the time series data for the Monthly Federal Reserve Board Production Index from 1948 to 1978. Using this data, we can try to fit the model based on the SARIMA modelling.

library(astsa)
plot(prodn)
acf2(prodn)

The slow decay in residual correlation shown in the ACF is a sign that differencing may be required.

acf2(diff(prodn), 48)

Even with the first order of differencing, we observe that there is still slow residual decay in the ACF plots at a seasonal lag period of 12. This thus suggest a seasonal difference to be applied.

acf2(diff(diff(prodn),12),48)

After applying the differencing, we can approach the time series ARMA & SARMA modelling based on the ACF and PACF plots.

From the seasonal lag perspective, we can see that the ACF cuts off at the 3rd seasonal lag, while the PACF appears to tail off. This would suggest a SARMA model of (0,3).

Within the first seasonal cycle, it can be seen that PACF appears to be cutting off at lag = 2, while the ACF tails off.

Thus a proposed model can be ARMA (2,0) x (0,3)_12 for the differenced time series.

Using the sarima() function, we can provide the proposed modelling inputs and inspect the model fit diagnostics.

sarima(prodn, 2,1,0, 0,1,3, 12) # sarima takes in arguments in the following order: data, ARIMA inputs (p,d,q), SARIMA inputs (P,D,Q), and seasonal lag s 

Looking at the model diagnostics, we can see that the model does fit fine, although there might still be some outliers in the data with unexplained variance (as shown in the Normal QQ plot, and the standardised residuals at around year 1958).

Using the sarima.for() function, we can provide a forecast of the next few time intervals based on our model.

sarima.for(prodn, 20, 2,1,0, 0,1,3, 12) # forecast prediction for next 20 time points

Along with the predicted data, there is the prediction bounds (+- 1 standard error represented by the darker gray bands and +-2 standard errors boundaries represented by the lighter gray bands). As the time progresses beyond the first predicted point, the uncertainty increases and thus the prediction boundaries increase in amplitude.

--

--