Forecasting Interest Rates With ARMA
An Introduction to Purely Statistical Forecast Models on Python
Wall Street indices predicted nine out of the last five recessions ~ Paul A. Samuelson
The quote above explains why correctly forecasting something is just another word for getting lucky but forecasting interest rates in particular has been the point of interest for many researchers and mathematicians for some time now. While many contributors have extended the ARMA model to better forecast interest rates, this post aims to provide an introduction to the sea of research with a very basic but a very strong model.
The Auto-Regressive (AR) Moving-Average (MA) model has two things going on simultaneously. The ‘AR’ part of the model regresses the data on its own past values, using the history of the variable to predict its future movements. The ‘MA’ part of the model learns from lagged error terms, or the white-noise in the past values of the variable.
The first summation takes into account the AR part of the model for ‘p’ values of lag whereas the second summation accounts for the MA part with ‘q’ values of lag. ε is the error term for ‘q’ lagged values. φ is the AR coefficient and θ is the MA coefficient. The AR and MA coefficients take values upon training on the data provided to the model.
Using the Right Data
As a first step, I’ll be importing my data.
#Importing Library
import pandas as pdfrom pandas.plotting import register_matplotlib_converters
register_matplotlib_converters() #allows using dates for indexing#Importing Datadf = pd.read_csv (r'path of file', parse_dates=['DATE'], index_col=['DATE'])FFR = df['FFR'].dropna()
You can download this .csv file from here. With the data loaded, we’ll have to check for its stationarity before plugging it into an ARMA model.
What differentiates ARMA from ARIMA is that an ARIMA model is able to handle non-stationary data. In this post, however, we’ll be using an ARMA model and I will therefore leave the debate of handling non-stationary data for another post.
But this doesn’t mean we can feed non-stationary data into an ARMA model and hope for great results, and interest rates data is almost never stationary. For this reason, I will be using the monthly Federal Funds interest rate data from 2009 to 2020, which happens to exhibit a good degree of stationarity.
import matplotlib.pyplot as pltplt.plot(FFR)
plt.title ('Feds Funds Rate')
plt.xlabel ('Year')
plt.show()
How do I know it’s stationary? I don’t!
Until you use a statistical method to estimate stationarity, you cannot establish that you can move forward with your data. A common method to use is the ADF test. The ADF test gives you two kinds of results:
- The p-value:
p-value ≤ 0.05 means that data is stationary
p-value > 0.05 means it’s not stationary - The ADF Statistic
If the ADF Statistic < 1% critical value it means you can reject the hypothesis (that your data is non-stationary) with a significance level of 1%.
Similarly, ADF statistic can be compared against a variety of significance levels.
In Python, StatsModels allows for a quick and easy way to estimate stationarity of data using the ADF test.
from statsmodels.tsa.stattools import adfullerresult = adfuller(FFR)
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
print('\t%s: %.3f' % (key, value))
Given that the p-value is less than 0.05 and that the ADF Statistic is less than the 1% critical value, we can establish with some certainty that our data is stationary.
Training the ARMA Model
Now that our data is ready to use, let’s see how the ARMA model works.
Here I’ll be using an ARMA (1,0) model. A model which regresses 1 lagged value of the Fed’s Funds Rate and 0 lagged terms for the moving average.
# Importing the ARMA Library
from statsmodels.tsa.arima_model import ARMA# Training Forecast Model with FFR Data using ARMA(1,0)
model_1 = ARMA(FFR, order=(1,0))
ffr_pred = model_1.fit()
print(ffr_pred.summary())
The summary of the model can give some important numbers that help you understand how well your model performed. The summary of this model looks like this
Lower AIC, BIC and HQIC values tell you that your model predicted values closer to the truth. Log Likelihood is an estimate of the fitness of the model, higher log likelihood values are desirable. ‘const’ is the constant ‘c’ added to every observation predicted by the model, in simpler words, it works like a lower bound if all other values in the model are non-negative. ‘ar.L1.FFR’ is the AR coefficient φ. We don’t see the MA coefficient θ because we used 0 MA terms to predict values in the model. From the summary we can infer that:
c = 0.3657
φ = 0.9846
Now, let’s test the model by forecasting some values.
# Plot the original series and the forecasted series
ffr_pred_endog.plot_predict(start=90, end=134)
plt.legend(fontsize=8)
plt.title('Forecasted FFR vs Actual FFR ')
plt.savefig('ffr forecast')
plt.show()
Sklearn in Python gives us a convenient function to calculate the accuracy of our model.
#Calculating Accuracy
from sklearn import metricsacc_1_pred = ffr_pred.predict(start=90, end=134)
acc_1_true = FFR[-len(acc_1_pred):]print('Mean Squared Error = ' , metrics.mean_squared_error(acc_1_true , acc_1_pred))
print('Mean Absolute Error = ' , metrics.mean_absolute_error(acc_1_true , acc_1_pred))
MSE of 0.037 is a fairly low error value to move forward with.
We could also use this model to forecast future FFR values.
#Forecasting Future Values for 24 Months ffr_pred.plot_predict(start=135 , end=159, dynamic= True)ffr_forecast = np.array(ffr_pred.predict(start=135, end=159))
Appending the results with the actual past values looks like this.
The best we can do to test the accuracy of future forecasted values is by comparing them with the results of other people. The US Federal Reserve gives an FOMC projection for a number of economic variables in an annual report which should make it a reliable source to test the results of our model with. You can find the report here. We can see constantly near-zero values of the Fed’s Funds Rate here as our model predicted.
The purpose of this post and methodology is to give you a glance on how Python can be used for forecasting variables, any consequential decisions should be followed by proper research and understanding.