Comparing Holt-Winters exponential smoothing and ARIMA models for time series analysis

Lawrence May
7 min readOct 6, 2021

--

In this post, I will be comparing two very popular techniques for time series forecasting, Holt-Winters exponential smoothing and the ARIMA family of models. I will do this using the USgas dataset, which is a record of monthly natural gas consumption in the United States between the years 2000 and 2020. This will be loosely following Rami Krispin’s excellent book on time series. For all the code, please click here.

I will be comparing and evaluating both methodologies against the naive model which simply forecasts each observation to be equal to the previous one (Y_t = Y_t-1). I will be using four different performance metrics:

1.) Mean Squared error (MSE): MSE simply takes the difference between the forecasted value Yhat and the actual value Y, squares it for each observation and then takes the average over all observations. The reason for this is so that negative and positive values don’t cancel each other out. Additionally, this also penalises predictions that are very far off more than ones that are only slighty wrong.
2.) Mean Absolute error (MAE): MAE also computes the difference between forecasted value and actual value, but instead of squaring it simply takes the absolute value of the difference. It then computes the average among all the differences. This method is preferable if we want all errors to be weighted equally, and don’t want errors that are further off to be penalised higher.

3.) Root Mean Squared Error (RMSE): RMSE is simply the square root of the MSE. The reason for this is so the error is on the same scale as the original data, giving better insight into how the error term compares to the data.

4.) Mean Absolute Percentage Error (MAPE): MAPE is the average ratio between the absolute error and the absolute value of the actual value (Y).

Looking at the data

From looking at the plot above, we can tell that there is quite a strong seasonal component in the data, which makes sense as a lot of heating is powered using natural gas so there would naturally be more demand for gas during the winter months compared to the summer months. We can also see a slightly increasing trend from about 2010 onwards. Let’s take a look at a decomposition of the time series and see if this confirms these first visual impressions:

Decomposition reveals a trend as well, beginning around 2005 but becoming more pronounced from about 2010 onwards. I’ll be splitting the data into training and test set, the test set will be the last 12 months of the data. I’ll start off by fitting a naive model, which simply uses the average of all preceding observations. This will be a good benchmark to compare the performance of more complex models to.

As we can see, the naive model simply takes the average of all previous observations without any considerations of seasonal or trend data. Let’s fit a Holt Winters exponential smoothing model instead and see how well this captures the data and whether we get an improvement over the naive model:

Fitting a Holt-Winters exponential smoothing model

hw_mod <- HoltWinters(train)
hw_mod

We can see that the model takes into account the average level of the preceding observations (alpha = 0.37) as well as a strong seasonal component (gamma = 0.44). It does not, however, take into account a trend (beta = 0).

This is a significant improvement over the naive model, with RMSE reducing by more than 75% from 500 to only 115. We can see a similar reduction when looking at MAPE and MAE.

Looking visually at our predictions, this confirms what the error metrics tell us. The model beautifully captures the seasonal patterns.

Fitting an ARIMA model

Let’s see how a different class of models, the ARIMA model, performs for this dataset. Arima models rely on a combination of two modelling techniques, the AR and MA processes. The AR process basically explains a model’s future values as a linear combination of it’s previous values, or lags. It requires the time series to be stationary, meaning it can have an increasing or decreasing trend or variance.
Lots of time series are not stationary however, be it due to trends, varying seasonal patterns or a certain number of random events. Fortunately, there is a way to deal with these time series as well using a technique called differencing. This mean subtracting the value from one cycle prior (so for example in a daily series, Yt = Yt — Yt-365). After doing this, we simply analyse the difference between these two time points. Forecasts made on these differences easily can be converted back to an actual time series.

Due to the strong seasonality in the data, a SARIMA model, which has three additional parameters to capture seasonality, will likely be the most appropriate. To get a better understanding of the serial correlation of the data (how much each observation is correlated with previous observations, or lags), I will plot the ACF and PACF plots.

The ACF plot indicates that there is a strong positive correlation with previous observations of the same season, and a weaker but still significant negative correlation with observations of the opposite season. All of this does not come as a surprise since we know that natural gas consumption is highly seasonal. We can also see that the correlation is decaying over time, indicating that the series is in fact not stationary and we will need to do some differencing to continue with our ARIMA model.

After differencing both with respect to seasonal effects and once more to account for non-constant variance, we arrive at the above output which now looks relatively stable. Let’s take another look at the ACF and PACF plots:

As we can see in the plot, after the transformations the correlation with the lags seem to be tailing off very quickly. Let’s try fit an arima model using auto.arima. This automatically determines the best parameters for the AR and MA processes, as well as doing the required differencing for us:

USgas_arima_mod <- auto.arima(train)
USgas_arima_mod

Not bad, we reduced RMSE from 115 using HoltWinters to just 103. Let’s take a look at the plot of the forecast:

This model also captures the seasonal component very well, while also including the trend. We can see that the peak for the forecasted time points is just a little bit higher than in the previous years, which is what we would expect looking at previous year’s changes. This is something the Holt Winters model did not pick up on, which is reflected in the slightly lower RMSE.
To check how well the model fits our data, let’s perform a residuals check:

checkresiduals(USgas_arima_mod)

Everything looks good here, we can see that the residuals resemble white noise and are evenly distributed with a mean of zero. There are no significant autocorrelations in the lags, and the Ljung-Box test also does not reject the null-hypothesis of no auto correlation with a non-significant p-value of 0.13.

Conclusion

And there we go, we’ve successfully fitted a few different models, both of which do a good job of capturing the data, but I think it’s fair to say that the ARIMA model has come out as the clear winner in this case. Let’s take another look at it’s predictions for the last 12 months, complete with a confidence interval:

Thanks very much for reading!

--

--