Interpreting ARMA model results in Statsmodels for absolute beginners

Nikol Holicka
Analytics Vidhya
Published in
5 min readOct 23, 2019

When I worked on my first-time series project I struggled with interpreting the results of my model. Although the metrics and coefficients resemble those from linear regression, I was somehow puzzled by how to interpret them. I was searching online for a comprehensible guide that would explain the result line by line. I could not find a good source, that would not only concentrate on a single metric. So, I decided to compile this source myself!

In this blogpost I will go through the Statsmodels Model Results for ARMA time series. I will use a single example and describe what each result stands for. I will skip over the obvious ones, such as ‘Dep. variable’ or ‘No. Observations’. Let’s have a look at the general results first.

Model

This is a reference to the model that is being used. ARMA(2,2) refers to the Autoregressive (AR), the Moving Average (MA) model. The numbers in the brackets refer to the particular lags that your model is using. In this case, we are using ARMA model, which takes in the values of 2 and 2 for ‘p’ ( autoregression) and ‘q’ (moving average). The following results will refer to this particular combination. If you want to find out more about the these values stand for, this video is great at explaining the essential concepts behind ARMA models.

Method

This field tells you which calculation was used for defining the parameters. There are various methods available for estimating the parameters, such as Yule Walker procedure, method of moments or maximum likelihood estimation (MLE). This field shows us which method was used to calculate the parameters. In our case, ‘css-mle’ stands for ‘conditional sum of squares’ and ‘maximum likelihood estimation’. But what does this mean?

The StatsModels documentation page tells us that “the conditional sum of squares likelihood is maximized and its values are used as starting values for the computation of the exact likelihood via the Kalman filter.”

Well, if you are like me, you might be still asking “But what does it mean?”

It means that the estimated mean of the distribution is based on a normal distribution with its peak at the highest probability point of the observed values. MLE’s role in the algorithm is to determine the values for the parameters of the model with a high degree of probability that the model’s results will be close to the observed (given) data.

And what about the Kalman filter? Kalman filter is a handy algorithm that runs in the background of this. It takes in series of observations (values) over time as well as statistical noise and other inaccuracies. Based on this it produces estimates for new variables that are generally more accurate than those based on a single observation alone. Sounds quite impressive, right? Kalman filter was famously used in the Apollo program, where it was incorporated into the navigation computer to calculate the trajectory estimation!

Source: nasa.gov

Log-Likelihood

The log-likelihood value is a simpler representation of the maximum likelihood estimation. It is created by taking logs of the previous value. This value on its own is quite meaningless, but it can be helpful if you compare multiple models to each other. Generally speaking, the higher the log-likelihood, the better. However, it should not be the only guiding metric for comparing your models!

AIC

AIC stands for Akaike’s Information Criterion. It is a metric that helps you evaluate the strength of your model. It takes in the results of your maximum likelihood as well as the total number of your parameters. Since adding more parameters to your model will always increase your value of the maximum likelihood, the AIC balances this by penalizing for the number of parameters, hence searching for models with few parameters but fitting the data well. Looking at the models with the lowest AIC is a good way to select to best one! The lower this value is, the better the model is performing.

BIC

BIC (Bayesian Information Criterion) is very similar to AIC, but also considers the number of rows in your dataset. Again, the lower your BIC, the better your model works. BIC induce a higher penalization for models with complicated parameters compared to AIC.

Both BIC and AIC are great values to use for feature selection, as the help you find the simplest version with the most reliable results at the same time.

HQIC

HQIC stands for Hannan–Quinn information criterion that can also be used for feature selection. This one is not used as frequently as BIC or AIC.

Let’s now have a look at the table of coefficients.

The ‘coef’ column represents the significance of each feature.

  • ar.L1 refers to the autoregressive term with the lag of 1, ar.L2 represents the same, but with the lag of 2.
  • ma.L1 and ma.L2 refer to the ‘moving average’ terms with lag of 1 and 2. All of these coefficients are part of the ARMA equation below. This example is a second-order model. The higher the number of lags you use in your model, the longer the equation will be.
  • The ‘std err’ columns is an estimate of the error of the predicted value. It tells you how strong is the effect of the residual error on your estimated parameters (the first column).
  • The ‘z’ is equal to the values of ‘coef’ divided by ‘std err’. It is thus the standardised coefficient.
  • The P>|z| column is the p-value of the coefficient. It is really important to check these p-values before you continue using the model. If any of these values are higher than your given threshold (usually 0.05), you might be using an unreliable coefficient that might cause misleading results. In our example, all p-values are lower than 0.05, so this model looks good to go!
  • The last two columns represent the confidence intervals. I wrote about confidence intervals in my last blogpost! In simple words, these values are the coefficient value minus (left column) and plus (right column) the given error margin.

More resources:

--

--