Machine Learning for Forecasting: Transformations and Feature Extraction

Supervised learning with time series. How to create univariate forecasting models using Python

Vitor Cerqueira
Towards Data Science
4 min readNov 15, 2022

--

Photo by Adam Śmigielski on Unsplash

In this post, you’ll learn to apply supervised learning with time series using Python.

This includes two things:

  • transforming time series from a sequence into a tabular format;
  • adding new features based on summary statistics.

Introduction

Forecasting is one of the most studied problems in data science. The goal is to predict future values of a time series.

Accurate forecasts are invaluable for decision makers. They reduce future uncertainty, thereby improving the planning of operations.

Traditional approaches to forecasting include methods such as ARIMA or exponential smoothing. But, machine learning regression approaches are increasingly used to solve this problem.

Machine learning approaches frame the task as supervised learning. The goal is to create a model based on historical data. Yet, it’s not clear how one can train a model using a sequence of values as the input.

Turns out, there’s a neat transformation which allows us to do that.

Time Delay Embedding

A model is trained to derive patterns between observations and the consequences of those observations.

How do we do that with time series?

The value of a time series can be thought as the consequence of the past recent values before it. This value works as the target variable. The past recent values are used as explanatory variables.

Such process reshapes the series from a sequence of values into a tabular format. This transformation is called time delay embedding, and is the key of auto-regression.

Here’s a Python function to do it:

--

--