Time Series Analysis using Unobserved Components Model in Python

Varishu Pant
Analytics Vidhya
Published in
9 min readJan 14, 2020

--

Hey there statisticians and Time Series fanatics! Here’s my take on the Unobserved Components Model. Happy reading!

What is UCM?

Unobserved Components Model (UCM) (Harvey (1989)) performs a time series decomposition into components such as trend, seasonal, cycle, and the regression effects due to predictor series.

What to expect from this article?

In an influential article, Harvey and Jaeger (1993) described the use of unobserved components models (also known as “structural time series models”) to derive stylized facts of the business cycle. In particular, they make the argument that these goals are often better met using the unobserved components approach rather than the popular Hodrick-Prescott filter or Box-Jenkins ARIMA modeling techniques. Taking inspiration from Harvey and Jaeger I consider the following time series:

  • US real GNP, “output”, (GNPC96)
  • US GNP implicit price deflator, “prices”, (GNPDEF)

The time frame in the original paper varied across series but was broadly 1954–1989. Below I use data from the period 1970–2020 for all series. Although the unobserved components approach allows isolating a seasonal component within the model, the series considered in the paper, and here, are already seasonally adjusted. All data series considered here are taken from the Federal Reserve Economic Data (FRED). Conveniently, the Python library Pandas can download data from FRED directly.

You can expect a deep dive into the theory behind UCM as well as the hands-on implementation of UCM on real-world data.

PROCEED AT CAUTION! (p.s.-It’s not that difficult, believe in yourself)

Getting Data

This is how the data looks like:

5 rows of data

ABOUT THE MODEL-

Alright, this is what you’re here for!

Now we know: Response Time Series = Superposition of components such as Trend, Seasons, Cycles, and Regression effects…

Each component in the model captures some important features of the series dynamics. Components in the model have their probabilistic models. The probabilistic component models include meaningful deterministic patterns as special cases.

This is what the generalized model looks like:

Components-

Trend-

The trend component is a dynamic extension of a regression model that includes an intercept and linear time-trend.

where the level is a generalization of the intercept term that can dynamically vary across time, and the trend is a generalization of the time-trend such that the slope can dynamically vary across time. For both elements (level and trend), we can consider models in which:

The element is included vs excluded (if the trend is included, there must also be a level included). The element is deterministic vs stochastic (i.e. whether or not the variance on the error term is confined to be zero or not)

Trends are loosely defined as the natural tendency of the series to increase or decrease or remain constant over a period of time in absence of any other influencing variable.UCM can model trend in two ways; first being the random walk model implying that trend remains roughly constant over the time period of the series, and the second being locally linear trend having an upward or downward slope.

Special Cases-

Examples-

Seasonal-

The seasonal component is written as:

The periodicity (number of seasons) is s, and the defining character is that (without the error term), the sum of the seasonal components is zero across one complete cycle. The inclusion of an error term allows the seasonal effects to vary over time. If the seasonal effect is stochastic, then there is one additional parameter to estimate via MLE (the variance of the error term).

A seasonal pattern exists when there exists a consistent pattern of variation influenced by seasonal factors (e.g., the quarter of the year, or day of the week, etc.).

Representations of Seasonal Pattern(Period=s)-

  1. As a list of s numbers that sum to zero
  2. As a sum of [s/2] deterministic, undamped cycles, called harmonics, of periods s, s/2, s/3, …

Here [s/2] = s/2 if s is even and [s/2] = (s-1)/2 if s is odd.

Example: For s = 12, the seasonal pattern can always be written as a sum of six cycles with periods 12, 6, 4, 3, 2.4, and 2.

Variants of the model-

  1. Stochastic Dummy Variable Seasonal Model- Let there be s seasons during the year, s = 12 for monthly data, s = 4 for quarterly data, and s = 2 for bi-annual data. Consider the following model for the seasonal effect gamma_t at time t:

In this model, the sum of the seasonal effects has a zero mean although their stochastic nature allows them to evolve either slowly over time (when variance (sigma_w²) is small) or quickly over time (when it is large).

2.Deterministic Dummy Variable Seasonal Model-

In the special case where the variance is 0 in the above model, we have the following the so-called Deterministic Dummy Variable Seasonal model. In this model the seasonal effects -

are fixed and do not vary over time in contrast to the stochastic specification in the previous case. In this case a test of the absence of seasonality in the time series data being analyzed amounts to testing the null hypothesis

where the sum constraint

implies that

Examples-

Cycle-

The cyclical component is intended to capture cyclical effects at time frames much longer than captured by the seasonal component. For example, in economics, the cyclical term is often intended to capture the business cycle and is then expected to have a period between “1.5 and 12 years”.

The cycle is written as:

The parameter λc (the frequency of the cycle) is an additional parameter to be estimated by MLE. If the seasonal effect is stochastic, then there is one another parameter to estimate (the variance of the error term — note that both of the error terms here share the same variance, but are assumed to have independent draws). Cycles in a time series data exists when the data exhibit rises and falls that are not of fixed period. The duration of these fluctuations is usually at least 2 years.

Variants of the model-

1. Deterministic Cyclical model- Let the deterministic cycle with frequency

be written as

If t is observed continuously, this is a periodic function with

Period-

Amplitude-

Phase-

If response is measured only at integer values of t, then it is not exactly periodic unless -

Unfortunately, the cycles in economic and business time-series data are scarcely ever as systematic as would be depicted in any one deterministic periodic function.

2. Stochastic Cyclical Model- As an alternative to specifying one or more of the deterministic cycles and introducing a multitude of parameters, one can specify a stochastic cyclical model as in-

where 0≤rho≤1 is a dampening factor and the disturbances vt and vt* are independently distributed as N(0,sigma_v²) random variables. This model can capture quite complex cyclical patterns in economic and business time series without introducing an abundance of parameters. If rho<1, series has a stationary distribution with mean zero and variance -

If rho=1, series is non-stationary. Of course, if variance (sigma_v²)=0, we revert to the deterministic cyclical model.

Examples-

Irregular-

The irregular component is assumed to be a white noise error term. Its variance is a parameter to be estimated by MLE; i.e.

In some cases, we may want to generalize the irregular component to allow for autoregressive effects:

In this case, the autoregressive parameters would also be estimated via MLE.

Regression Effects-

We may want to allow for explanatory variables by including additional terms

or for intervention effects by including

These additional parameters could be estimated via MLE or by including them as components of the state space formulation.

Fitting UCM Models

Since the data is already seasonally adjusted and there are no obvious explanatory variables, the generic model considered is:

The irregular will be assumed to be white noise, and the cycle will be stochastic and damped. The final modeling choice is the specification to use for the trend component. Harvey and Jaeger consider two models:

-Local linear trend (the “unrestricted” model)

-Smooth trend (the “restricted” model, since we are forcing ση=0)

Below, we construct kwargs dictionaries for each of these model types. Notice that rather that there are two ways to specify the models. One way is to specify components directly, as in the table above. The other way is to use string names that map to various specifications.

Argument Dictionaries

We now fit the following models:

  1. Output, unrestricted model
  2. Prices, unrestricted model
  3. Prices, restricted model
Fitting 3 models

For unobserved components models, it is often more instructive to plot the estimated unobserved components (e.g. the level, trend, and cycle) themselves to see if they provide a meaningful description of the data.

Inference

Let’s summarize the models to highlight the relative importance of the trend and cyclical components.

Coefficients are in the order — — level, trend,
cycle, damping_cycle, period_cycle(=2*pi/frequency_cycle) and
irregular. Also, all values except damping and period cycles are scaled up by multiplying 1e7.

For the unrestricted model on GNP, the coefficient of the period cycle is 24.63, highest among others, signifying that it is the most important component for this model.

Conclusion

Hi all! Hope this was helpful! To get the codes and try it out for yourself, head on over to my Github.Link below-

By Varishu Pant

-Statistician, Data Scientist and Lyricist.

For any suggestions, corrections or just to have a chat with me, reach me here-

https://www.linkedin.com/in/varishu-pant/

Also, check out my other blogs (like Youtube Popularity Prediction) published in Analytics Vidhya here-

And if you like listening to Hip-Hop, I have some originals on my Youtube Channel-

Also Spotify and other streaming services:

distrokid.com/hyperfollow/vpawk/spaz

--

--

Varishu Pant
Analytics Vidhya

Data Scientist|Statistician|Praxite|Lyricist|L&T FS