A Complete Introduction To Time Series Analysis (with R):: Gaussian Time Series

Hair Parra
Jan 23 · 8 min read
Image for post
Image for post
The likelihood for a Gaussian Time Series

In the last two articles, we saw a number of methods to independently estimate AR(p) and MA(q) coefficients, namely the Yule-Walker method, Burg’s Algorithm, and the Innovations Algorithm, as well as the Hannan-Risennan Algorithm, to jointly estimate ARMA(p,q) coefficients, by making use of initialized AR(p) and MA(q) coefficients with the previous algorithms. We also mentioned, that these methods, as sophisticated as they are, tend to perform quite poorly when it comes to dealing with real datasets, as it’s easy to misspecify the true model. Therefore, we would like to yer introduce another assumption: normality of the observations. These are called Gaussian Time Series.

Gaussian Time Series

Image for post
Image for post
Image for post
Image for post

Then its likelihood follows a multivariate normal density given by

Image for post
Image for post

where

Image for post
Image for post

Note that Gamma_{n} is indeed a function of the parameters that we are trying to maximize (that’s it, the phi’s for the AR(p) part, the thetas for the MA(q), and the sigma squared for the common variance). This follows as K(i,j) is indeed a function of the autocovariance function, which in turn contains such parameters, depending on the actual model. Therefore, we would like to be able to maximize this likelihood to obtain the most suitable set of parameters of the model. However, the presence of the Gamma_n matrix presents two huge problems: calculations and maximizations of its inverse and determinant are extremely computationally costly and do not provide a “friendly” gradient either.

Innovations Algorithm for Gaussian Time Series

So, how can we actually solve this issue? Once again, the Innovations Algorithm comes to the rescue. By considering the Innovations instead of the actual observations by themselves, we can express the previous likelihood as

Image for post
Image for post

where

Image for post
Image for post

are functions of the phi and theta parameters, but not of sigma squared.

Proof

  • Let
Image for post
Image for post
  • Let
Image for post
Image for post

be the coefficients from the Innovations Algorithm.

  • Note that the Innovation
Image for post
Image for post

is the prediction error for the j-th observation.

  • Recall the matrix C_{n}, defined as
Image for post
Image for post
  • Recall as well that form the Innovations algorithm, we also have that
Image for post
Image for post
  • It can be shown that
Image for post
Image for post

, that is, Innovations are uncorrelated, which implies that

Image for post
Image for post
  • This further implies that the vector of innovations
Image for post
Image for post

has a diagonal covariance matrix, given by

Image for post
Image for post

where the diagonal entries come directly from the Innovations algorithm, since

Image for post
Image for post

That is, these are given by the recursive formula

Image for post
Image for post

Using these facts, we have that

Image for post
Image for post

, from which follows that

Image for post
Image for post

That is, we can replace in the likelihood:

Image for post
Image for post

For the determinant, we have that

Image for post
Image for post

So that we can then write down the likelihood as

Image for post
Image for post

But wait! Although we have indeed achieved to express the likelihood in a rather nice form, we still cannot optimize for the sigma square parameter independently. Recall that for the ARMA(p,q), the Innovations algorithm defines W_{t} as

Image for post
Image for post

, from which it follows that

Image for post
Image for post

, so that

Image for post
Image for post

Therefore, we can conclude that

Image for post
Image for post

Note: Recall that for the ARMA(p,q) model, K{i,j} is given by

Image for post
Image for post

, from which you can more clearly see the dependence on the parameters. See my article on Innovations Algorithm for ARMA(p,q) models.

Gaussian MLE estimates of ARMA(p,q) model

Now the question is: how do we optimize such likelihood? It turns out that we can find MLE estimates of the form

Image for post
Image for post

and,

Image for post
Image for post

Proof

We can perform some simple algebraic manipulation as follows. First, we apply the natural logarithm to the likelihood, obtaining

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

Solving w.r.t. sigma square yields the respective estimate. Now, plugging this back into the likelihood yields:

Image for post
Image for post
Image for post
Image for post

Ignoring the constants, this implies that the MLE estimates of the phi and theta parameters are given by

Image for post
Image for post

Note

  • Consider the SS_{Res} (Sum of Squared Residuals)
Image for post
Image for post

Inside the summation, the numerator is the square prediction error given by the squared difference between each observation and its estimate given by the suitable BLP under some parameters phi’s and theta’s. This is then normalized by the MSE of that prediction.

  • What makes an innovation “unlikely”?

Note that

Image for post
Image for post
Image for post
Image for post

Therefore, we see that this natural normalization is necessary because simply considering the innovations would give too much weight to early observations. By normalizing, they are on “equal footing”.

  • In the equation
Image for post
Image for post

the second term can be seen as a kind of geometric mean regularization, giving an extra penalty on the likelihood for large values of r_{j}.

  • Note that the optimization problem above has no closed-form solution. However, we can use numerical optimization algorithms such as gradient descent or Newton-Raphson. These algorithms, however, require initial values for the parameters in question; therefore choosing “good starting values” can yield better estimates and ensure faster convergence. Some good options are estimates produced by other algorithms, such as Burg’s algorithm and Hannan-Rissanen’s.
  • Before modern optimization methods, people used to perform lazy optimization, in which the optimization objective was simplified to
Image for post
Image for post

This way, it can be solved analytically, producing the estimates

Image for post
Image for post

However, these estimates often produce non-causal and/or non-invertible solutions, without placing additional constraints, which would yield once again a similar objective to the one we originally presented.

Asymptotic Normality of Gaussian MLE Estimates

Now let’s just see a couple of normality properties of the resultant coefficients. These are useful in obtaining things like confidence intervals for the parameter values, for instance. Let

Image for post
Image for post

be the true parameters from the MLE estimates in the previous section. Then, for large n

Image for post
Image for post

where

Image for post
Image for post

for

Image for post
Image for post

for

Image for post
Image for post

In practice, we can approximate the variance matrix by the hessian of the parameters, given by

Image for post
Image for post

where

Image for post
Image for post

This is convenient because the Hessian is usually calculated as a part of some optimization algorithms.

Last time

Estimation of ARMA(p,q) Coefficients (Part II)

Next Time

That’s it! Next time, we will learn about Model Selection for ARMA(p,q) models, and finally see some comprehensive examples on applied time series analysis with ARMA(p,q) models in R. Stay tuned, and happy learning!

Main page

Follow me at

  1. https://blog.jairparraml.com/
  2. https://www.linkedin.com/in/hair-parra-526ba19b/
  3. https://github.com/JairParra
  4. https://medium.com/@hair.parra

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Hair Parra

Written by

Data Scientist & Data Engineer at Cisco, Canada. McGill University CS, Stats & Linguistics graduate. Polyglot.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Hair Parra

Written by

Data Scientist & Data Engineer at Cisco, Canada. McGill University CS, Stats & Linguistics graduate. Polyglot.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store