Sitemap

Functime: A Python Library for Efficient Time-Series Feature Extraction and Forecasting 🤪

7 min readJan 31, 2024

--

What is Functime? Introduction to Functime module with examples. A Comprehensive Guide to Functime. Use functime for Time-Series analysis.

Photo by Esteban Amaro on Unsplash

According to the official documentation, functime is a machine learning library for time-series predictions that just works. And its,

  • Fully-featured: Powerful and easy-to-use API for forecasting and feature engineering (tsfresh, Catch22).
  • Fast: Forecast 100,000 time series in seconds on your laptop.
  • Efficient: Extract 100s of time-series features in parallel using Polars *
  • Battle-tested: Algorithms that deliver real business impact and win competitions.

Functime is a robust library meticulously crafted for time-series forecasting and feature extraction, specifically tailored for handling expansive panel datasets. What sets functime apart is its distinctive preprocessing options and innovative cross-validation splitters, ushering in a new era of versatility in time-series analysis.

Brimming with unparalleled speed and efficiency, functime boasts the capability to process a staggering 100,000 time-series in a matter of seconds. This exceptional velocity is achieved through its ingenious utilization of Polars for parallel feature engineering — a testament to its commitment to delivering results with unprecedented swiftness.

However, functime transcends the realm of sheer speed; it is a formidable companion armed with proven machine learning algorithms and a robust framework that supports exogenous features across all forecasters. Automation is at its core, seamlessly handling intricate tasks such as managing lags and hyperparameter tuning through the prowess of FLAML.

Installation

To install the latest functime, run the following command:

pip install functime

Functime comes with extra options. For example, to install functime with large-language model (LLM) and lightgbm features:

pip install "functime[llm,lgb]"

ann: To use ann (approximate nearest neighbors) forecaster
cat: To use catboost forecaster
xgb: To use xgboost forecaster
lgb: To use lightgbm forecaster
llm: To use the LLM-powered forecast analyst
plot: To use plotting functions

Preprocessing

Funtime leverages parallelized time-series preprocessing via Polars. Each of Funtime’s preprocessors operates on a panel DataFrame as input, locally transforming each time-series in parallel, akin to a group-by operation on a per-time-series basis.

These transformations serve to stabilize time-series, such as using boxcox to stabilize variance, or render them stationary by applying first differences or detrending. Certain transformations, like diff and detrend, are reversible, facilitating the conversion of forecasted transformed time-series back to their original scale, adding to their utility.

Example

We will visualize common time-series preprocessing techniques before and after the time-series transformation. These transformations make the time-series look more “well-behaved”, which generally makes the time-series easier to forecast.

#Importing the libraries and the dataset
import polars as pl

from functime.plotting import plot_forecasts, plot_panel
from functime.preprocessing import (
boxcox,
deseasonalize_fourier,
detrend,
diff,
fractional_diff,
scale,
yeojohnson,
)
data = pl.read_parquet("https://github.com/TracecatHQ/functime/raw/main/data/commodities.parquet")
entity_col, time_col, target_col = data.columns
data.head()
data.get_column("commodity_type").n_unique()

There are total 71 commodities.

most_volatile_commodities = (
data.group_by(entity_col)
.agg((pl.col(target_col).std() / pl.col(target_col).mean()).alias("cv"))
.top_k(k=4, by="cv")
)
most_volatile_commodities

Visualizing the top 4 most volatile time-series by coefficient of variation.

selected = most_volatile_commodities.get_column(entity_col)
y = data.filter(pl.col(entity_col).is_in(selected))
figure = plot_panel(y=y, height=800, width=1000)
figure.show()

Let’s try if we can preprocess these time-series to make them easier to forecast.

Detrending

transformer = detrend(freq="1mo", method="linear")
y_detrended = y.pipe(transformer).collect()
figure = plot_forecasts(
y_true=y, y_pred=y_detrended.group_by(entity_col).tail(64), height=800, width=1000
)
figure.show()

Inverting the transformation,

y_original = transformer.invert(y_detrended).group_by(entity_col).tail(64).collect()
subset = ["Natural gas, Europe", "Crude oil, Dubai"]
figure = plot_forecasts(
y_true=y.filter(pl.col(entity_col).is_in(subset)),
y_pred=y_original,
height=400,
width=1000,
)
figure.show()

Deseasonalize

M4 hourly dataset, which has clear seasonal patterns. Functime support deseasonalization via residualized regression on Fourier terms to model seasonality.

m4_data = pl.read_parquet("https://github.com/TracecatHQ/functime/raw/main/data/m4_1w_train.parquet")
m4_entity_col, m4_time_col, m4_target_col = m4_data.columns
y_m4 = m4_data.filter(pl.col(m4_entity_col).is_in(["W174", "W175", "W176", "W178"]))
figure = plot_panel(y=y_m4, height=800, width=1000)
figure.show()
# Fourier Terms
transformer = deseasonalize_fourier(sp=12, K=3)
y_deseasonalized = y_m4.pipe(transformer).collect()
y_seasonal = transformer.state.artifacts["X_seasonal"].collect()
figure = plot_panel(
y=y_seasonal.group_by(m4_entity_col).tail(64), height=800, width=1000
)
figure.show()
y_deseasonalized = y_m4.pipe(transformer).collect()
y_original = transformer.invert(y_deseasonalized).collect()
figure = plot_panel(
y=y_original.group_by(m4_entity_col).tail(64), height=800, width=1000
)
figure.show()

Differencing

First differences is used in time-series analysis to transform a non-stationary time-series into a stationary one by taking the difference between consecutive observations. Assumes the time-series is integrated with unit root 1.

transformer = diff(order=1)
y_diff = y.pipe(transformer).collect()
figure = plot_forecasts(
y_true=y, y_pred=y_diff.group_by(entity_col).tail(64), height=800, width=1000
)
figure.show()

Fractional Differencing

Sometimes we required to make a time series stationary without removing all of the memory from a time series. This can especially be useful in specific forecasting tasks where the next value is dependent on a long history of past values (e.g. forecasting the price of a stock). In this case, we can use fractional differencing. We can notice the difference between these plots and the previous plots. It is worthwhile to run multiple tests using a scoring function such as the augmented dickey-fuller test to determine the minimum value of d that makes a time series stationary.

transformer = fractional_diff(d=0.3, min_weight=1e-3)
y_diff = y.pipe(transformer).collect()
figure = plot_forecasts(
y_true=y, y_pred=y_diff.group_by(entity_col).tail(64), height=800, width=1000
)
figure.show()

Seasonal Differencing

transformer = diff(order=1, sp=12)
y_seas_diff = y.pipe(transformer).collect()
figure = plot_forecasts(
y_true=y, y_pred=y_seas_diff.group_by(entity_col).tail(64), height=800, width=1000
)
figure.show()

Local Scaling

Local Scaling is parallelized version of the scaling transformation (less mean, divide standard deviation) across many time-series.

transformer = scale(use_mean=True, use_std=True)
y_scaled = y_m4.pipe(transformer).collect()
figure = plot_panel(y=y_scaled.group_by(m4_entity_col).tail(64), height=800, width=1000)
figure.show()

Box-Cox

This transformation is used to stabilize the variance of the time-series. It requires all values to be positive.

transformer = boxcox(method="mle")
y_boxcox = y.pipe(transformer).collect()
figure = plot_panel(y=y_boxcox.group_by(entity_col).tail(64), height=800, width=1000)
figure.show()

Yeo-Johnson

This transformation is similar to Box-Cox, but without the strictly positive requirement.

transformer = yeojohnson()
y_yeojohnson = y.pipe(transformer).collect()
figure = plot_panel(
y=y_yeojohnson.group_by(entity_col).tail(64), height=800, width=1000
)
figure.show()

Feature extraction

import polars as pl
import numpy as np
from functime.feature_extractors import FeatureExtractor, binned_entropy

# Load commodities price data
y = pl.read_parquet("https://github.com/TracecatHQ/functime/raw/main/data/commodities.parquet")

# Get column names ("commodity_type", "time", "price")
entity_col, time_col, value_col = y.columns

# Extract a single feature from a single time-series
binned_entropy = binned_entropy(
pl.Series(np.random.normal(0, 1, size=10)),
bin_count=10
)

# Also works on LazyFrames with query optimization
features = (
pl.LazyFrame({
"index": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"value": np.random.normal(0, 1, size=10)
})
.select(
binned_entropy=pl.col("value").ts.binned_entropy(bin_count=10),
lempel_ziv_complexity=pl.col("value").ts.lempel_ziv_complexity(threshold=3),
longest_streak_above_mean=pl.col("value").ts.longest_streak_above_mean(),
)
.collect()
)

# Extract features blazingly fast on many stacked time-series using `group_by`
features = (
y.group_by(entity_col)
.agg(
binned_entropy=pl.col(value_col).ts.binned_entropy(bin_count=10),
lempel_ziv_complexity=pl.col(value_col).ts.lempel_ziv_complexity(threshold=3),
longest_streak_above_mean=pl.col(value_col).ts.longest_streak_above_mean(),
)
)

# Extract features blazingly fast on windows of many time-series using `group_by_dynamic`
features = (
# Compute rolling features at yearly intervals
y.group_by_dynamic(
time_col,
every="12mo",
by=entity_col,
)
.agg(
binned_entropy=pl.col(value_col).ts.binned_entropy(bin_count=10),
lempel_ziv_complexity=pl.col(value_col).ts.lempel_ziv_complexity(threshold=3),
longest_streak_above_mean=pl.col(value_col).ts.longest_streak_above_mean(),
)
)

Forecasting

import polars as pl
from functime.cross_validation import train_test_split
from functime.seasonality import add_fourier_terms
from functime.forecasting import linear_model
from functime.preprocessing import scale
from functime.metrics import mase

# Load commodities price data
y = pl.read_parquet("https://github.com/TracecatHQ/functime/raw/main/data/commodities.parquet")
entity_col, time_col = y.columns[:2]

# Time series split
y_train, y_test = y.pipe(train_test_split(test_size=3))

# Fit-predict
forecaster = linear_model(freq="1mo", lags=24)
forecaster.fit(y=y_train)
y_pred = forecaster.predict(fh=3)

# functime functional design
# fit-predict in a single line
y_pred = linear_model(freq="1mo", lags=24)(y=y_train, fh=3)

# Score forecasts in parallel
scores = mase(y_true=y_test, y_pred=y_pred, y_train=y_train)

# Forecast with target transforms and feature transforms
forecaster = linear_model(
freq="1mo",
lags=24,
target_transform=scale(),
feature_transform=add_fourier_terms(sp=12, K=6)
)

# Forecast with exogenous regressors!
# Just pass them into X
X = (
y.select([entity_col, time_col])
.pipe(add_fourier_terms(sp=12, K=6)).collect()
)
X_train, X_future = y.pipe(train_test_split(test_size=3))
forecaster = linear_model(freq="1mo", lags=24)
forecaster.fit(y=y_train, X=X_train)
y_pred = forecaster.predict(fh=3, X=X_future)
y_pred.head()

Build a custom transformer

Functime have an easy-to-use and functional @transformer decorator to implement new transformers. Here is an example:

@transformer
def lag(lags: List[int]):
"""Applies lag transformation to a LazyFrame.

Parameters
----------
lags : List[int]
A list of lag values to apply.
"""

def transform(X: pl.LazyFrame) -> pl.LazyFrame:
entity_col = X.columns[0]
time_col = X.columns[1]
max_lag = max(lags)
lagged_series = [
(
pl.all()
.exclude([entity_col, time_col])
.shift(lag)
.over(entity_col)
.suffix(f"__lag_{lag}")
)
for lag in lags
]
X_new = (
# Pre-sorting seems to improve performance by ~20%
X.sort(by=[entity_col, time_col])
.select(
pl.col(entity_col).set_sorted(),
pl.col(time_col).set_sorted(),
*lagged_series,
)
.group_by(entity_col)
.agg(pl.all().slice(max_lag))
.explode(pl.all().exclude(entity_col))
)
artifacts = {"X_new": X_new}
return artifacts

return transform
source

— — —

Why was the time-series analysis conference always so noisy?

Because everyone was talking about the latest trend in forecasting!

🙂🙂🙂

--

--

Responses (1)