ARIMA-GARCH forecasting with Python

Thomas Dierckx
Analytics Vidhya
Published in
3 min readSep 9, 2020

ARIMA models are popular forecasting methods with lots of applications in the domain of finance. For example, using a linear combination of past returns and residuals, an attempt can be made to predict future returns. Sadly, when returns exhibit a change in variance over time, this family of models runs into problems. This phenomenon is often present and also known as conditional variance, or volatility clustering. Luckily though, there’s another family of models able to model this irregularity: (G)ARCH models. Both types of models are therefore combined in practice to optimize forecasting performance.

Python has great packages for training both ARIMA and GARCH models separately, but none that actually combine both (like R’s nifty package rugarchdamn you R users). Let’s take a look at how we can combine these models in Python while staying clear of too much theory. There’s plenty of other tutorials out there to expand on what we cover in this post, and I’ll conveniently provide you with links when necessary. Let’s delve into it!

ARIMA modelling in Python

Python has two popular packages for modelling ARIMA processes: pmdarima and the statsmodels package. The great thing about pmdarima is that it finds the optimal ARIMA(p, d, q) parameters for you, whereas the statsmodels package forces you to manually find the optimal parameters. Both packages provide plenty of variables to tinker with, so their documentation is definitely worth a visit. In addition, there’s two great tutorials that cover ARIMA in Python more in depth: a statsmodels version and a pmdarima version.

pmdarima vs statsmodels

GARCH modelling in Python

When it comes to modelling conditional variance, arch is the Python package that sticks out. A more in depth tutorial can be found here. Note that there’s no package like pmdarima that automatically finds the optimal parameters p and q based on a given criterion.

Can a GARCH model be used on its own to predict returns? Yes and no. The model in the code snippet above assumes that returns have a constant mean. However, in real life, this won’t completely capture the skewness and leptokurtosis that is present. That’s why ARIMA and GARCH models are so often combined. An ARIMA model estimates the conditional mean, where subsequently a GARCH model estimates the conditional variance present in the residuals of the ARIMA estimation.

Combining ARIMA and GARCH

So how do we combine both ARIMA and GARCH models? Let’s have a brief look at the math behind an ARMA-GARCH model:

ARMA-GARCH model

The formula is pretty straightforward. The final prediction is given by combining the output of the ARIMA model (red) and GARCH model (green).

Let’s see how this works in Python! Note that we fit the GARCH model on the residuals of the ARIMA model instead of the returns this time.

That’s all there is to it! However, there’s one caveat in the way we work here (taken from a stackoverflow post):

You may choose to fit an ARMA model first and then fit a GARCH model on the ARMA residuals, but this is not the preferred way. Your ARMA estimates will generally be inconsistent. (In a special case where there are only AR terms and no MA terms, the estimates will be consistent but inefficient.) This will also contaminate the GARCH estimates. Therefore the preferred way is to estimate both ARMA and GARCH models simultaneously. Statistical software is capable of doing that (see e.g. rugarch package for R).

One way to overcome this problem is to train a lot of different ARIMA(p1, d, q1)-GARCH(p2, q2) models, and select the best working one based on criteria such as aic or bic.

Next steps

There’s an abundance of potential next steps to explore. For example, we used ARIMA to model the mean of the returns series. But other mean models exist. The same applies to variance modelling, as there’s plenty of other variance models to choose from! Lastly, we only considered the univariate case in this post. Here’s an excellent post how to apply ARIMA-GARCH on a multivariate case (in R).

--

--