Forecasting Interest Rates With ARMA

An Introduction to Purely Statistical Forecast Models on Python

Syed Hadi
The Startup
6 min readSep 17, 2020

--

Herman Wold laid the foundations for ARMA, giving the world the first purely statistical forecast technique. Source: hetwebsite.net

Wall Street indices predicted nine out of the last five recessions ~ Paul A. Samuelson

The quote above explains why correctly forecasting something is just another word for getting lucky but forecasting interest rates in particular has been the point of interest for many researchers and mathematicians for some time now. While many contributors have extended the ARMA model to better forecast interest rates, this post aims to provide an introduction to the sea of research with a very basic but a very strong model.

The Auto-Regressive (AR) Moving-Average (MA) model has two things going on simultaneously. The ‘AR’ part of the model regresses the data on its own past values, using the history of the variable to predict its future movements. The ‘MA’ part of the model learns from lagged error terms, or the white-noise in the past values of the variable.

The first summation takes into account the AR part of the model for ‘p’ values of lag whereas the second summation accounts for the MA part with ‘q’ values of lag. ε is the error term for ‘q’ lagged values. φ is the AR coefficient and θ is the MA coefficient. The AR and MA coefficients take values upon training on the data provided to the model.

Using the Right Data

As a first step, I’ll be importing my data.

You can download this .csv file from here. With the data loaded, we’ll have to check for its stationarity before plugging it into an ARMA model.

What differentiates ARMA from ARIMA is that an ARIMA model is able to handle non-stationary data. In this post, however, we’ll be using an ARMA model and I will therefore leave the debate of handling non-stationary data for another post.

But this doesn’t mean we can feed non-stationary data into an ARMA model and hope for great results, and interest rates data is almost never stationary. For this reason, I will be using the monthly Federal Funds interest rate data from 2009 to 2020, which happens to exhibit a good degree of stationarity.

How do I know it’s stationary? I don’t!
Until you use a statistical method to estimate stationarity, you cannot establish that you can move forward with your data. A common method to use is the ADF test. The ADF test gives you two kinds of results:

  1. The p-value:
    p-value ≤ 0.05 means that data is stationary
    p-value > 0.05 means it’s not stationary
  2. The ADF Statistic
    If the ADF Statistic < 1% critical value it means you can reject the hypothesis (that your data is non-stationary) with a significance level of 1%.
    Similarly, ADF statistic can be compared against a variety of significance levels.

In Python, StatsModels allows for a quick and easy way to estimate stationarity of data using the ADF test.

Given that the p-value is less than 0.05 and that the ADF Statistic is less than the 1% critical value, we can establish with some certainty that our data is stationary.

Training the ARMA Model

Now that our data is ready to use, let’s see how the ARMA model works.

Here I’ll be using an ARMA (1,0) model. A model which regresses 1 lagged value of the Fed’s Funds Rate and 0 lagged terms for the moving average.

The summary of the model can give some important numbers that help you understand how well your model performed. The summary of this model looks like this

Lower AIC, BIC and HQIC values tell you that your model predicted values closer to the truth. Log Likelihood is an estimate of the fitness of the model, higher log likelihood values are desirable. ‘const’ is the constant ‘c’ added to every observation predicted by the model, in simpler words, it works like a lower bound if all other values in the model are non-negative. ‘ar.L1.FFR’ is the AR coefficient φ. We don’t see the MA coefficient θ because we used 0 MA terms to predict values in the model. From the summary we can infer that:

c = 0.3657
φ = 0.9846

Now, let’s test the model by forecasting some values.

Sklearn in Python gives us a convenient function to calculate the accuracy of our model.

MSE of 0.037 is a fairly low error value to move forward with.

We could also use this model to forecast future FFR values.

Appending the results with the actual past values looks like this.

The best we can do to test the accuracy of future forecasted values is by comparing them with the results of other people. The US Federal Reserve gives an FOMC projection for a number of economic variables in an annual report which should make it a reliable source to test the results of our model with. You can find the report here. We can see constantly near-zero values of the Fed’s Funds Rate here as our model predicted.

Source: federalreserve.gov

The purpose of this post and methodology is to give you a glance on how Python can be used for forecasting variables, any consequential decisions should be followed by proper research and understanding.

--

--