TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Automate Time Series Feature Engineering in a few lines of Python Code

Extract hundreds of relevant features for your time series use-case

Satyam Kumar
TDS Archive
Published in
3 min readAug 11, 2022

--

Image by Jan Vašek from Pixabay

Time Series data capture the variable's value repeatedly over time resulting in a series of data points indexed in time order. In time series data has natural temporal ordering i.e. the value of a variable at a particular time is dependent on past values.

Traditional machine learning algorithms are not designed to capture the temporal ordering of time series data. A data scientist needs to perform feature engineering to capture important characteristics of the data into a few metrics. Generating a lot of time series features and extracting the relevant ones from those is time taking and tedious task.

Here tsfresh package comes into the picture, which can generate standard hundreds of generic features for your time series data. In this article, we will discuss the in-depth usage and implementation of the tsfresh package.

tsfresh:

tsfresh is an open-source package that can generate hundreds of relevant time series features, fit to train a machine learning model. The features generated from tsfresh can be used to solve Classification, Forecasting, and Outlier Detection use-case.

Getting Started:

tsfresh package offers various capabilities to perform feature engineering on time series data including:

  • Feature Generation
  • Feature Selection
  • Compatibility with large data

Installation & Usage:

tsfresh is an open-sourced Python package that can be installed using:

pip install -U tsfresh
# or
conda install -c conda-forge tsfresh

1) Feature Generation:

tsfresh package offers an automated features generation API that can generate 750+ relevant features from 1 time series variable. The generated features include a wide range of spectrum including:

  • Descriptive Statistics (mean, max, correlation, etc)
  • Physics-based indicators for nonlinearity and complexity

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Responses (1)