Analyzing ETF performance by Regime segmentation

Pranesh Muppala
Jul 27, 2017 · 5 min read

Ever wondered why a stock or an ETF didn’t move in an anticipated direction based on a news event/macro economic indicator/Industry-wide survey release? Individual stocks are more prone to idiosyncratic “noise” compared to major sector/market index based ETFs mainly because they could be swayed by a range of local events/news/individuals specific to that firm. But broader index based ETFs should in theory be less sensitive to these type of “local” factors and they will be the focus in the rest of this article. One approach to study the performance of a index ETF is to divide its returns time series into different segments that can be better understood and reasoned out separately instead of looking at the entire time series at once. Let’s call a “regime” as a period of time during which an ETF’s performance has been relatively uniform — characterized by the mean and variance of its daily returns. Below presented model can also be expanded or adapted to include metrics other than mean and variance, but let’s start with these two simple and more commonly used metrics. When there is a major shift in one or more of these characteristics, it would indicate that the regime has shifted. The purpose of this article is to show how we can segment an ETF returns time series into regimes and study these individual segments after grouping identical ones into clusters (using an unsupervised machine learning technique) instead of trying to analyze the entire time series at once.

Regimes

To illustrate the idea, let’s take a look at the historical performance of XLY (A consumer discretionary SPDR fund ETF) from google finance:

XLY historical performance with some unique “regimes” pointed out

The above picture shows different regimes as characterized by the direction of their returns and “choppiness” (variance). As we can see, over time, one regime transitions into another one and they also tend to repeat themselves (not necessarily next to each other). The remaining sections talk about an approach we take to automatically segment and classify historical performance of a given ETF.

Approach

At a very high level, a three step approach (not necessarily sequential) is taken as outlined in the “Time Series Distance Measures” article referenced below:

  1. Feature Extraction
  2. Segmentation
  3. Clustering

Feature Extraction — In this step, for a given segment of time series, we try to come up with a best fit line approximation using linear regression and calculate its Mean Square Error (MSE). MSE is a cumulative measure of the residuals for this linear approximation

Source: http://mtweb.mtsu.edu/stats/dictionary/formula.htm

Segmentation — A bottom-up segmentation approach as described in “Segmenting Time Series: A survey and novel approach” (referenced below) is used to segment the time series. The process involves initially breaking up the ETF returns time series into tiny segments comprising of just two neighboring points. Care is taken to avoid grouping points that could constitute a major inflection point. Then, MSE values are calculated for best fit regression lines if we merge adjacent segments in the time series. In a iterative process, we merge the two segments that produce the least MSE measure by combining them as long as it is still less than the user provided “Segmentation Threshold” value. This process is repeated until no more segments can be combined (least MSE measure is no longer smaller than the “Segmentation Threshold” value). In the end, the number of segments and how long they could get is a factor of “Segmentation Threshold” value (a decimal value between 0 and 1). The lower the value, the more number of segments (smaller length) we end up with and vice versa.

Clustering — In this step, our goal is to do something akin to pattern matching. Now that we have divided the time series into segments that cannot be combined anymore, we try to look for identical “looking” segments and group them together as clusters. Here, we use mean and variance of individual segments as two measures to decide if the segments look identical or not. K means clustering is used to perform this step. In any clustering problem, finding the right value for “k” is a difficult and very subjective decision to make. Silhouette analysis on KMeans clustering is used to help arrive at a reasonable “k” value.

Solution

An implementation of this approach is developed and available at http://vespanalytics.com/RegimeClassifier

Regime Classifier

Some notes on how to use this application

  1. As an initial step, data collection is limited to the list of ETFs mentioned, so please pick one ETF from the drop-down list
  2. Across multiple test runs, it was evident that Segmentation Threshold values between 0.2 and 0.8 seem to produce meaningful segments of the time series. Feel free to play with this value

Results and Analysis

Let’s take one sample run and walk through the results. Let’s try with XLY with a segmentation threshold value of 0.3

XLY (Consumer Discretionary SPDR ETF Fund with Threshold value of 0.3

The first plot shows the different clusters and how they are “scattered” on a mean-variance spectrum. The size of each dot (cluster) is proportional to the number of data points (days) that fall into that cluster grouping. It makes sense to see one or two big clusters while the remaining are fairly small because stock market returns tend to follow a relatively smooth trend most of the days mixed in with interim periods of extreme volatility and big trend shifts. More importantly, we can see a clear separation of cluster regions that may be hard to spot with naked eye if we analyze the entire time series all at once.

Cluster graph

The second plot shows the different segments of the return time series and which cluster grouping each of those segments fall under. This plot helps in clearly illustrating the fact that there could be unique repeated segment patterns across different parts of the time series. When combined with the first plot, its easier to both visualize individual segments and also know their mean/variance values.

Time Series broken into different cluster segments

Conclusion

Feel free to explore different ETF’s and their segment clusters and how they could vary by changing the input parameters. This is an effort to first breakdown an ETF’s historical performance into uniquely identifiable segment patterns and pave the way to take a deeper dive into the individual segments to better analyze and understand historical movements.

References

Time Series Distance Measures

Segmenting Time Series: A survey and novel approach

Silhouette Analysis and K means clustering

Pranesh Muppala

Written by

Financial Analytics|Data Science|http://vianalytics.in

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade