An Introduction To Shapelets: The Shapes In Time Series
Ever wondered how a Fitbit or any gadget detects when you are walking or running and automatically detects every time you exercise? This is just one of the many applications of time series data.
Time series data is a collection of records obtained over time. This data would always have a sequence to it, and changing the order could produce or depict a completely different situation.
Currently, in the real world, the applications of time series are endless ranging from health care, human activity recognition, cyber-security, finance, marketing, automated disease detection, anomaly detection, etc. Due to the abundant availability of temporal data, there is a strong interest in applications based on time series, and many algorithms for classification have been proposed.
How do we classify Time Series?
There are many methods to classify time series data. Some of the standard well-known techniques use K-Nearest Neighbours with different elastic distance measures like Dynamic Time Warping (DTW), Time Warp Edit (TWE), or Complexity Invariant Distance (CID) to identify classes within the data.
We also have deep learning methods which show potential in time series forecasting through methods such as automatic learning of temporal dependence. However, due to the high dimensionality of time series data, these techniques prove expensive in terms of training time and memory requirements. Given the high computational burden using traditional algorithms, a concept known as Shapelets was proposed by Ye and Keogh.
What are shapelets?
As most time series data exhibits inter-class differences within sub-sequences rather than on the complete series, shapelets were meant to represent these discriminative sub-sequences of time-series data. In simple terms, we identify a shape within the series that distinguishes it from other classes in that domain. An example of a Shapelet is shown below.
The above figure shows the time series one-dimensional representation of a leaf. The highlighted section shows the subsequence that best represents this leaf. There are different ways shapelets are identified with techniques that optimise discovery and classification time. Some of the well-known shapelet algorithms are Fast Shapelets and Learning Time-Series Shapelets.
Shapelet Implementations
Most shapelet implementations were done in C++ or Java, and there are no official implementations of these algorithms within the Python Standard Library. I am also currently working on a GitHub repository in python to identify shapelets and classify them. Some of the open-source python implementations of shapelets available right now are mentioned below:
Learning Time-Series Shapelets by mohaseeb
Source: shaplets-python
Installation
git clone git@github.com:mohaseeb/shaplets-python.git
cd shaplets-python
pip install .
Usage
from shapelets_lts.classification import LtsShapeletClassifier
# create an LtsShapeletClassifier instance
classifier = LtsShapeletClassifier(
K=20,
R=3,
L_min=30,
epocs=50,
lamda=0.01,
eta=0.01,
shapelet_initialization='segments_centroids',
plot_loss=True
)
# train the classifier.
# train_data.shape -> (# train samples X time-series length)
# train_label.shape -> (# train samples)
classifier.fit(train_data, train_label, plot_loss=True)
# evaluate on test data.
# test_data.shape -> (# test samples X time-series length)
prediction = classifier.predict(test_data)
# retrieve the learnt shapelets
shapelets = classifier.get_shapelets()
# and plot sample shapelets
from shapelets_lts.util import plot_sample_shapelets
plot_sample_shapelets(shapelets=shapelets, sample_size=36)
Sktime by The Alan Turing Institute
Source: sktime
Sktime is a unified framework developed by the Alan Turing Institute for machine learning with time-series data. This package contains a shapelet transform, which can be used to extract shapelets from data.
Installation
pip install sktime
or
conda install -c conda-forge sktime
Usage
from sktime.transformers.series_as_features.shapelets import ContractedShapeletTransform# How long (in minutes) to extract shapelets for.
time_contract_in_mins = <time to search>
# The initial number of shapelet candidates to assess per training series.
initial_num_shapelets_per_case = <no of shapelets>
ShapeletTransformer = ContractedShapeletTransform( time_contract_in_mins=time_contract_in_mins,
num_candidates_to_sample_per_case=initial_num_shapelets_per_case)ShapeletTransformer.fit(train_x, train_y)#Plot the shapelets
for i in range(0,min(len(ShapeletTransformer.shapelets),5)):
s = ShapeletTransformer.shapelets[i]
# summary info about the shapelet
print("#"+str(i)+": "+str(s))
# overlay shapelets
plt.plot(
list(range(s.start_pos,(s.start_pos+s.length))),
train_x.iloc[s.series_id,0][s.start_pos:s.start_pos+s.length]
)
plt.show()
Conclusion
In this article, I have introduced shapelets in time series and their advantages over traditional methods. I intend to write my next article to provide an in-depth view of the algorithms to extract shapelets and how they can be used for classification problems.
Thank you very much for reading! Let me know if you have any questions or comments.
About the Author:
Rohit Vincent is a Data Analytics Consultant here at Version 1.