Published in

Navigating Through Time Series Clustering

Meme by Author | Source: | Inspired by Boromir (Sean Bean) from Lord of The Rings franchise (2011)

Cluster analysis or better known as clustering has a strong tie with the domain of social science. It was first originated in the field of Anthropology and was later introduced to Psychology by Robert Tyron¹. Since then, it has evolved into a field that encompasses mathematical rigor. The main objective of cluster analysis is to split and categorise objects into various groups - an endeavour that is harder than it actually sounds. This is because an algorithm has no sense of what is ‘similar’ and what is different ‘different’. To tackle this issue, we need objective measures that quantitatively inform us of the relative distance of various objects in space. One measure will be the Minkowski distance:

The Minkowski distance is a generalisation of a few distance metrics that we are familiar with. For example, when we put q = 2, we will get the Euclidean distance. It is also possible to create custom distance metrics if we are ambitious enough as long as it satisfies the following criteria:

Once the proximity measure has been chosen, clustering algorithms such as Kmeans can then be used to categorise our objects.

Does this work for time series?

While we can directly use Kmeans and Euclidean distance on time series data, it may not produce good clusters because Euclidean distance ignores time shifts. It also does not work with data of varying lengths. The same goes for other ‘normal’ distance metrics. Meanwhile, time series data come in all sizes and time shifts. Fortunately for us, techniques to cluster time series have been developed over the years and can be categorised into three major groups²:

  • Feeding the time series directly into clustering algorithms with specific distance metric — custom distance metric approach
  • Extracting features from the time series before utilising clustering algorithms — features extraction approach
  • Model the time series and then perform clustering on the models — Modelling approach

a) Custom distance metric approach

This approach replaces the traditional distance metrics with ones that are more suitable for time series. One popular metric is Dynamic Time Warping (DTW). Unlike Euclidean, DTW does not ignore time shifts. It even works for two time series that do not have equal lengths. The following diagram shows the difference between Euclidean and DTW:


As shown above, when we do Euclidean matching on time series that have time shifts, we will match the wrong sections of the time series together. As a result, our clustering algorithm may conclude these two time series are very far apart and hence, different.

To find the DTW distance, we must first construct a n x m Local Cost Matrix (LCM) whose values correspond to the cumulative distance between elements in time series A and time series B. In layman’s terms, fill in the empty matrix with a formula by considering values in time series A and B exhaustively. Once the matrix has been constructed, the DTW distance is simply the shortest possible path from the bottom left corner to the upper right corner of the LCM:

We can then use this distance instead of Euclidean for our clustering algorithms. Fortunately, there are python packages lying around so that we do not have to construct DTW from scratch. But for the purists out there, there are many resources online to help us construct DTW from the ground up. And of course, you are always welcome to read the original paper³.

b) Feature extraction approach

Under this approach, we will be extracting features from the time series before utilising clustering algorithms. One interesting method is to perform Discrete Fourier Transformation (DFT). The main idea behind DFT is compelling: a transformation. Under this approach, the time series is transformed into various frequencies. Clustering algorithms and a distance metric of our choice can then be applied to these frequencies. One benefit of using the DFT is dimensionality reduction. Fortunately for us, the scipy library can be used to perform DFT and calculate these frequencies. Additionally, there are many online resources lying around about the DFT as well⁴.

c) Modelling approach

Under this approach, we consider the fact that each time series is generated by some kind of process and hence can be modeled. For example, we can firstly build Hidden Markov Models (HMMs) for the time series before using clustering algorithms on the HMMs⁴. Another approach will be to use the classic ARIMA⁵. The modeling approach of course can be much more time-consuming compared to both the custom distance metric and feature extraction approach. Using the ARIMA approach, for example, we must pay attention to the stationarity of the time series among other considerations. While there are python packages that can be used to automate the process of finding AR(p), I(d), and MA(q) values, the quality of the models may not be as good. To yield the best result especially for ARIMA, we must fine-tune the models manually.

What can I do with it?

Figuring out a way to cluster time series opens up many interesting applications. This is because time series data is very prevalent in any industry. Personally, I applied some of these methods for my university dissertation where I tried to answer the following question:

Can cluster analysis be used to assist portfolio managers create diversified portfolios?

That is just one example of an interesting application of time series clustering (at least for me). I hope by sharing this knowledge, people will be able to work on many interesting projects. That being said, do tell me about it if you have done any cool time series clustering related projects!

[1]Robert Choate Tryon. Cluster analysis: Correlation profile and orthometric
(factor) analysis for the isolation of unities in mind and personality. Edwards
brother, Incorporated, lithoprinters and publishers, 1939.

[2]T Warren Liao. Clustering of time series data — a survey. Pattern recognition, 38(11):1857–1874, 2005.

[3]Müller, Meinard. “Dynamic time warping.” Information retrieval for music and motion (2007): 69–84.


[5]Coviello, Emanuele, Antoni B. Chan, and Gert RG Lanckriet. “Clustering hidden Markov models with variational HEM.” The Journal of Machine Learning Research 15.1 (2014): 697–747.

[6]Kalpakis, Konstantinos, Dhiral Gada, and Vasundhara Puttagunta. “Distance measures for effective clustering of ARIMA time-series.” Proceedings 2001 IEEE international conference on data mining. IEEE, 2001.




Data Scientists must think like an artist when finding a solution when creating a piece of code. ⚪️ Artists enjoy working on interesting problems, even if there is no obvious answer ⚪️ 🔵 Follow to join our 18K+ Unique DAILY Readers 🟠

Recommended from Medium

Wine Quality Prediction Using Machine Learning

Intel has released the OpenLORIS-Scene datasets for SLAM research

Model-Agnostic Meta-Learning

Dataset augmentation for Deep Learning

Data Augmentation and Preliminary Model Training

Understanding Regularization in Machine Learning

A quick guide to using Spot instances with Amazon SageMaker

What is Machine Learning (ML)?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Albert Wibowo

Albert Wibowo

A full-time nerd

More from Medium

Use Clustering to Detect Time Series Anomaly

How to interpret sMAPE just like MAPE

Comparison between MAPE and sMAPE metrics

Multiple Time Series Forecast & Demand Pattern Classification using R — Part 3

Determining the best length of the history of your timeseries data for timeseries forecasting