Introduction to ARMA Models with Financial Data

12 min readMay 20, 2023

Overview

There is something about statistical models that really pulls at my curiosity.

The way we can mathematically represent a data process and leverage its properties to forecast into the future seems equal parts wizardry and mathematical genius. Could it be the manner in which we arrive at set assumptions about the data, basing our modeling approach on domain theory? Or am I drawing on a competitive need to find the best performing model to predict the future?

Whatever draws you in, let’s explore the inner workings of Autoregressive Moving Average (ARMA) models. We navigate through loading financial data, detailing model assumptions, discussing our model statement, obtaining our estimated model parameters, yielding predictions, and measuring model performance.

Data Import

Let’s begin by importing the following libraries to get our hands on financial data.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pmdarima as pm
import yfinance as yf

from pmdarima.model_selection import train_test_split
from sklearn.metrics import mean_absolute_percentage_error, mean_squared_error
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.arima.model import ARIMA

We can use the yfinance package in Python which uses Yahoo’s publicly available API to pull historical data. This walkthrough will use data from the S&P 500 ($SPY). We create a ticker object for $SPY allowing us to access methods or attributes of the object. Below we define start and end dates in string format for the period of historical data we want to pull, using the history method.

spy = yf.Ticker("SPY")
hist = spy.history(start = "2010-01-04", end = "2020-02-01")
df = pd.DataFrame(hist, columns=['Close'])
df.head

Date                                 
2010-01-04 00:00:00-05:00   88.117889
2010-01-05 00:00:00-05:00   88.351151
2010-01-06 00:00:00-05:00   88.413376
2010-01-07 00:00:00-05:00   88.786560
2010-01-08 00:00:00-05:00   89.082054
...                               ...
2020-01-27 00:00:00-05:00  307.216400
2020-01-28 00:00:00-05:00  310.435852
2020-01-29 00:00:00-05:00  310.179321
2020-01-30 00:00:00-05:00  311.185974
2020-01-31 00:00:00-05:00  305.535522

Stationarity: an important assumption

To begin a data analysis of time series data using ARMA modeling, it is useful to look at a plot of historical stock prices over time (referred to as a “trace plot”) to check for stationarity.

# Add trace plot
plt.figure(figsize=(8,5))
plt.plot(df['Close']);

Let’s understand the importance of stationarity conceptually and then mathematically. Oftentimes, it is the case that we would like to make predictions about a future point in time based on historical data. For example, we may wish to predict the return of an asset in the years 2023–2030 based on historical data from the 2015–2022. It is necessary for making valid forecasts that the mean, variance and covariance estimates we make for historical data (2015–2022) are similar to those of future data (2023–2030) — in layman terms, we must have evidence that properties of the past will carry forward into the future. We say that these estimates depend not on time point t but on the lag h between any time point s and t (i.e., |s-t|). In order for stationarity to be satisfied, these estimates would not differ across different time points t, as long as the lags between two time points are equivalent. For example, if stationarity is satisfied, we can conclude that the relationship — covariance — between the one year lag period of 2015 and 2016 is equivalent to the relationship between future one year lag periods of 2023 and 2024, 2024 and 2025, and so on.

This important property allows us to make valid statistical inferences about the time series data and leverage classical forecasting models such as ARMA.

Formally, in order for stationarity to be satisfied, the following must be true:

In the above trace plot, stationarity is not satisfied: the mean price of the asset is not constant and depends on t, and the variance, which we will informally describe as jaggedness, is higher in some time periods than others.

We can employ methods to ensure stationarity such as differencing our data. Currently, our data are represented as Xₜ where each time point has a stock price at close associated with it. However, we may be interested in differencing our data in order to achieve stationarity. We subtract the current observation by the previous to give us a new time series Yₜ:

The plots below indicate that differencing helped remove any noticeable trends or patterns. Notably, we now have a constant mean and variance, both of which do not depend on t.

fig, (ax1, ax2) = plt.subplots(2, sharex=True)
fig.suptitle('$SPY price: Effect of Differencing')
ax1.plot(df['Close'])
ax1.set_ylabel('price')
ax1.set_title('Original')
ax2.plot(df['Close'].diff())
ax2.set_title("Differenced")
ax2.set_ylabel('difference price')

Additionally, we use a more formal Augmented Dickey-Fuller (ADF) test, to test a null hypothesis that the series is not stationary. This test confirms our visual interpretation of stationarity for the differenced data.

from statsmodels.tsa.stattools import adfuller

def ADF_Cal(x):
    result = adfuller(x)
    ADF_stat = result[0]
    p = result[1]
    print("ADF Statistic: %f" % ADF_stat)
    print("p-value: %f" % p)
    print("Critical Values")
    levels = [.01, .05, .1]
    i = 0
    for key,value in result[4].items():
        print('\t%s: %.3f' % (key,value))
        hyp = p < levels[i]
        if ADF_stat < value:
            cert = (1-levels[i])*100
            print('Reject H0: {}'.format(hyp))
            break
        i = i+1
        if i >= 3:
            print("Less than 90% certain that data is stationary")
            print('Reject H0: {}'.format(hyp))

ADF_Cal(df['Close'].diff()[1:])

ADF Statistic: -11.415859
p-value: 0.000000
Critical Values
    1%: -3.433
Reject H0: True

ARMA Model Statement

An ARMA(p,q) model has two components: AR(p) and MA(q).

AR(p) refers to the autoregressive nature of the data. In the simplest case, this is equivalent to Xₜ = φXₜ₋₁ + Wₜ. This model assigns weight φ to a previous observation and adds noise Wₜ which we generally assume to be normally distributed. If we have several orders of the autoregressive component (p ≥ 1), we can define the coefficients φ₁, φ₂, …, φₚ within a characteristic polynomial such as:

which can be rewritten within a characteristic polynomial as

Here is an example of an order 2, p = 2, ARMA(2,0) process below:

which can be equivalently be rewritten as

MA(q) refers to the moving average component of the data. In the simplest case, we have Xₜ = θWₜ₋₁ + Wₜ. This takes the average over a sliding window. If we have several orders of the moving average component (q ≥ 1), we can also define the coefficients θ₁, θ₂, …, θₙ within a characteristic polynomial such as:

which can be rewritten as:

Combining the above, we reach the ARMA(p,q) model:

We can then rewrite the full ARMA(p,q) model by combining the characteristic polynomial notations for the AR(p) and MA(q) components:

ACF & PACF Plots: Selecting orders of p and q

Autocovariance function (ACF) and partial autocorrelation function (PACF) plots help us decide on the order, p and q to approximate an appropriate ARMA(p,q) model for the data.

ACF is the autocovariance of lag h. Recall, for a stationary time series the covariance of different time periods must not depend on t but on |s-t|.

The autocovariance between observations Xₜ and Xₜ₋₁ is equivalent to:

We expect this value to be the same between all observations which are one year (h = 1) apart.

PACF is the partial autocorrelation of lag h. The partial autocorrelation at lag h is the correlation between the time series data point at lag k and the same data point lagged by h time units, while controlling for the influence of all other data points between them. For example, at lag 3, partial autocorrelation removes the effect lags 1 and 2 have on computing the correlation. We can roughly use the visual cues below to determine what appropriate cutoffs for our ARMA() model would be.

# Plot ACF and PACF
diff_ts = df['Close'].diff()[1:]
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 8))
plot_acf(diff_ts, ax=ax1, lags = 10)
plot_pacf(diff_ts, ax=ax2, lags = 10)
plt.show()

While ACF and PACF can be used to determine the orders of p and q as described above, python’s pmdarima auto_arima() function will itself determine the orders of p and q to use for our ARMA(p,q) model. We implement the process by using a training and testing data split of 80%/20%, with more recent historical data serving as the test set.

It is important to note for reading model output that the differencing operation we applied to the data makes running an ARMA(p,q) model on the differenced data equivalent to running an ARIMA(p,d,q) model on the original data, where d is the order of differencing (d = 1 in this case).

# Split train test
y_train, y_test = train_test_split(df, train_size= 0.8)
# Fit model
model=pm.auto_arima(y_train,start_p=0,d=1,start_q=0,
          max_p=5,max_d=1,max_q=5, start_P=0,
          D=0, start_Q=0, max_P=5,max_D=0,
          max_Q=5, seasonal=False,
          error_action='warn',trace=True,
          supress_warnings=True,
          random_state=20,n_fits=50)

Performing stepwise search to minimize aic
 ARIMA(0,1,0)(0,0,0)[0] intercept   : AIC=6573.277, Time=1.52 sec
 ARIMA(1,1,0)(0,0,0)[0] intercept   : AIC=6573.373, Time=0.08 sec
 ARIMA(0,1,1)(0,0,0)[0] intercept   : AIC=6573.368, Time=0.10 sec
 ARIMA(0,1,0)(0,0,0)[0]             : AIC=6580.893, Time=0.03 sec
 ARIMA(1,1,1)(0,0,0)[0] intercept   : AIC=6574.633, Time=0.37 sec

Best model:  ARIMA(0,1,0)(0,0,0)[0] intercept
Total fit time: 2.095 seconds

model.summary()

SARIMAX Results                                
==============================================================================
Dep. Variable:                      y   No. Observations:                 2029
Model:               SARIMAX(0, 1, 0)   Log Likelihood               -3284.637
Date:                Fri, 19 May 2023   AIC                           6573.274
Time:                        13:39:48   BIC                           6584.503
Sample:                             0   HQIC                          6577.394
                               - 2029                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
intercept      0.0843      0.028      3.045      0.002       0.030       0.139
sigma2         1.4939      0.030     49.504      0.000       1.435       1.553
===================================================================================
Ljung-Box (L1) (Q):                   1.91   Jarque-Bera (JB):               837.71
Prob(Q):                              0.17   Prob(JB):                         0.00
Heteroskedasticity (H):               1.50   Skew:                            -0.44
Prob(H) (two-sided):                  0.00   Kurtosis:                         6.02
===================================================================================

Estimated Model and Interpretation

The automated process above selected an ARIMA(0,1,0) model for the original data, Xₜ. Since the data is differenced, this ARIMA(0,1,0) is equivalent to a random walk model given that Yₜ = Xₜ ₋ Xₜ₋₁ and our ARIMA(0,0,0) model is Yₜ₋ μ = Wₜ (random walk by definition).

In conjunction, our ACF and PACF plots show an absence of lagged patterns. Thus, this confirms that a random walk is most appropriate. Our results support previous findings in financial modeling literature which have confirmed that the most parsimonious ARMA(p,q) models and random walk models outperform other complex modeling processes (including deep learning models) with regularity.

The estimated model for this process is:

Yₜ -0.0843 = Wₜ , where Wₜ ∼ white noise(1.5)

Recall that Yₜ is our differenced data: Yₜ = Xₜ ₋ Xₜ₋₁

Additional Model for Comparison

We are interested in seeing how well our random walk model performs compared to a more complex model, which also serves to provide a better intuition of ARMA models more generally. Our more complex model is an ARMA(1,0) process on the differenced data — or equivalently ARIMA(1,1,0) — which seems to have performed well according to AIC values.

The estimated model for this process is:

Yₜ ₋ 0.0843 = -0.306 (Xₜ ₋ 0.0843) + Wₜ , where Wₜ ∼ wn(1.5) and Yₜ = Xₜ ₋ Xₜ₋₁ (i.e. our differenced dataset)

# Fit AR(1) model on training data
ar_mod = ARIMA(y_train, order=(1,1,0))
ar_fit = ar_mod.fit()
print(ar_fit.summary())

SARIMAX Results                                
==============================================================================
Dep. Variable:                  Close   No. Observations:                 2029
Model:                 ARIMA(1, 1, 0)   Log Likelihood               -3288.773
Date:                Thu, 18 May 2023   AIC                           6581.547
Time:                        11:34:48   BIC                           6592.777
Sample:                             0   HQIC                          6585.667
                               - 2029                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1         -0.0258      0.015     -1.742      0.082      -0.055       0.003
sigma2         1.5000      0.030     49.627      0.000       1.441       1.559
===================================================================================
Ljung-Box (L1) (Q):                   0.05   Jarque-Bera (JB):               866.58
Prob(Q):                              0.83   Prob(JB):                         0.00
Heteroskedasticity (H):               1.52   Skew:                            -0.46
Prob(H) (two-sided):                  0.00   Kurtosis:                         6.07
===================================================================================

Diagnostic Plots

Random Walk Model Diagnostics

We run the diagnostic plots to check that the random walk model is appropriate and a good fit to the data.

plt.rc("figure", figsize=(16,8))
model.plot_diagnostics()
plt.show()

We see in the correlogram that there is not significant correlation as most lags ≥ 1 fall within the bands. Thus, the time series residuals have low correlation with lagged versions of itself. This supports the finding that the time series is a random walk, as previous observations are not correlated with subsequent observations.

The normal Q-Q plot shows some deviation from normality at the tails and the density plot does show a slight deviation from normal. While this is generally concerning, we will see how we can manage and use these differences in subsequent analysis — ARCH and GARCH modeling.

We see in the Ljung-Box test from the model output that the probability is 0.17, so we fail to reject the null that the residuals are independently distributed, which is good and implies i.i.d.

ARMA(1,0) Model Diagnostics

We see very similar results as above from the ARMA(1,0) model’s correlogram, normal Q-Q plot, and Ljung-Box test, which validates our model selection.

ar_fit.plot_diagnostics()
plt.show()

Forecasting

Below we plot the historical data along with prediction intervals for the random walk and ARMA(1,0) models:

# Obtain predictions from random walk model 
rw_preds = pd.DataFrame(model.predict(n_periods = y_test.shape[0]))
rw_preds['date'] = y_test.index
rw_preds = rw_preds.set_index('date')
rw_preds.columns = ['predictions']

# Get CI 
model_pred_ci = model.predict(y_test.shape[0], return_conf_int=True)
pred, ci = model_pred_ci
lower = list(list(zip(*ci))[0])
upper = list(list(zip(*ci))[1])

# Make plot of predictions
plt.plot(y_train,label="Training")
plt.plot(y_test,label="Testing")

plt.plot(rw_preds['predictions'],label="Predicted")
plt.fill_between(y_test.index, lower, upper, color='blue', alpha=0.3)

plt.legend(loc = 'upper left')
plt.title('Predictions using Random Walk Model')
plt.show()

# Obtain predictions from ARMA(1,0) model on X_t
ar_preds = pd.DataFrame(ar_fit.forecast(y_test.shape[0]))
ar_preds['date'] = y_test.index
ar_preds = ar_preds.set_index('date')
ar_preds.columns = ['predictions']

# Get CI 
ar_pred_ci = ar_fit.get_forecast(y_test.shape[0]).conf_int()
ar_pred_ci['date'] = y_test.index
ar_pred_ci = ar_pred_ci.set_index('date')
# Make plot of predictions
plt.plot(y_train,label="Training")
plt.plot(y_test,label="Testing")
plt.plot(ar_preds['predictions'],label="Predicted")
plt.fill_between(ar_pred_ci.index, ar_pred_ci.iloc[:,0], ar_pred_ci.iloc[:,1], color='blue', alpha=0.3)
plt.legend(loc = 'upper left')
plt.title('Predictions using ARMA(1,0)')
plt.show()

Performance Metrics (MAPE and MSE)

To assess model performance, we use two metrics:

Mean Average Percent Error (MAPE)
Mean Squared Error (MSE)

MAPE is commonly used in time series data and calculated by taking the average percent difference between the predicted and actual values. MAPE provides benefits such as providing a more intuitive understanding of model performance and being a relative rather than absolute metric of comparison. Mathematically, it is calculated as follows:

MSE is another common performance metric which is an absolute metric of comparison that magnifies the impact of outliers — in this context, predictions far from actual values — due to the squared term. One benefit of MSE is that it remains in the same units (albeit squared) of the original forecasted value. It is calculated as follows:

data = {'MAPE': [mean_absolute_percentage_error(y_test,ar_preds['predictions']) * 100, mean_absolute_percentage_error(y_test,rw_preds['predictions']) * 100],
        'MSE': [mean_squared_error(y_test,ar_preds['predictions']), mean_squared_error(y_test,rw_preds['predictions'])]}

df = pd.DataFrame(data, index = ["AR1", "RW"])
df

We find for the ARMA(1,0) and the random walk model mean average percentage errors (MAPE) of 5.56 and 6.29, indicating that, on average, the predictions have an error of 5.56% and 6.29%, respectively, relative to the actual values. Additionally, MSE values of 391.28 and 355.30 suggests that, on average, the predictions deviate from the actual values by approximately 391.28 dollars squared.

Conclusion

There we have it — an introduction to the ARMA modeling process. We validated a commonly agreed upon finding that financial asset data can be adequately modeled as a random walk process by building our understanding of the math, assumptions and model selection process behind ARMA models.

ARMA is the building block of many classical time series approaches. From here, we can layer on many other components to include seasonality, model the variance rather than the mean, and even include many time series process in parallel. I hope you enjoyed and return for further parts of this series!