Sktime — Feature Importance on TimeSeriesForestClassifier

Published in

TotalEnergies Digital Factory

9 min readOct 10, 2022

Time series data science use cases require generic algorithms which implies numerous data transformation steps.

At TotalEnergies, as in most industrial companies, the processing of time series represents the main use cases. In this article we will try to present Sktime, a Scikit-learn contribution dedicated to the processing of time series.

What is Sktime ?

Sktime is a time series dedicated package which proposes pipelines that automatically treat times series to be used by ensemble algorithms. It provides many times series data preparation tools such as a train/test split data leakage free function. Since Scikit-learn package requires data in a tabular structured way sktime propose also pipeline steps to automatically extract features from your time series to be treated by ensemble algorithms.

If you want to know more about sktime, please refer to this great article from

Alexandra Amidon: Sktime: a Unified Python Library for Time Series Machine Learning

Modeling

For the purpose of this article, it might be useful to use an example to illustrate some of the transformations available and go deeper into the process. Let’s consider a multivariate time series problem with 3 series called feature_0, feature_1 and last but not least feature_2 (Nobody would have guessed).

As a common thread, this article explains the different steps ColumnConcatenator and TimeSeriesForestClassfier to be use as follow:

from sktime.transformations.panel.compose import ColumnConcatenator
from sktime.classification.interval_based import TimeSeriesForestClassifier
from sklearn.pipeline import Pipelinesteps = [
  ('concatenate', ColumnConcatenator()),
  ('classify', TimeSeriesForestClassifier(**config['model_params']))
]
clf = Pipeline(steps)

Create ColumnConcatenator Entry Dataframe

Starting from a dataframe of time series with 3 multivariate features, the first step of data processing is creating frames.

time_series_dataframe = pd.DataFrame({
    "feature_0": [0., 0., 0., 1., 2., 3.],
    "feature_1": [10., 20., 30., 40., 50., 60.],
    "feature_2": [-4., -5., -6., -7., -8., -9.]
    },
    index=[
        "2022-01-01 00:00:00",
        "2022-01-01 01:00:00",
        "2022-01-01 02:00:00",
        "2022-01-01 03:00:00",
        "2022-01-01 04:00:00",
        "2022-01-01 05:00:00",
    ]
)

A frame is a windowed chunk of our time series.

This operation is done repeatedly on the whole time series to obtain the following dataframe:

windowed_chunk_time_series_dataframe = pd.DataFrame({
    "dim_0": [
        pd.Series([0., 0., 0.]),
        pd.Series([1., 2., 3.])
    ],
    "dim_1": [
        pd.Series([10., 20., 30.]),
        pd.Series([40., 50., 60.])
    ],
    "dim_2": [
        pd.Series([-4., -5., -6.]),
        pd.Series([-7., -8., -9.])
    ],
},
    index=["frame_0", "frame_1"]
)
windowed_chunk_time_series_dataframe_label = pd.DataFrame({
        "y": [0, 1]
    },
    index=["frame_0", "frame_1"]
)

Our frames are still separated into features, instead of that not match the requirements for TimeSeriesForestClassifier. ColumnConcatenator Transformer is transforming dataframe of multivariate time series frames to concatenated frames as follow:

input_dataframe = pd.DataFrame({
    "dim_0": [
        pd.Series([0., 0., 0.]),
        pd.Series([1., 2., 3.])
    ],
    "dim_1": [
        pd.Series([10., 20., 30.]),
        pd.Series([40., 50., 60.])
    ],
    "dim_2": [
        pd.Series([-4., -5., -6.]),
        pd.Series([-7., -8., -9.])
    ],
},
    index=["frame_0", "frame_1"]
)

concatenator = ColumnConcatenator()

output_dataframe = concatenator.fit_transform(input_dataframe)

expected_output_dataframe = pd.DataFrame({
    0: [
        pd.Series([0., 0., 0., 10., 20., 30., -4., -5., -6.]),
        pd.Series([1., 2., 3., 40., 50., 60., -7., -8., -9.])
    ]
},
    index=["frame_0", "frame_1"]
)
expected_output_dataframe.index.names = ["instances"]

pd.testing.assert_frame_equal(output_dataframe, expected_output_dataframe)

How is the feature space build ?

The temporal features calculated over time series intervals [15], referred to as interval features, can capture the temporal characteristics, and can also handle the distortion in the time axis. [cf A Time Series Forest for Classification and Feature Extraction]

Below, five examples of matrix with randomly created width intervals. Each matrix is then affected to one estimator.

clf["classify"].fit(
    expected_output_dataframe, 
    windowed_chunk_time_series_dataframe_label.values.reshape(-1)
)
print(clf["classify"].intervals_)
>>> [array([
       [ 860,  874],
       [1044, 1062],
       [ 330,  788],
       [  87,  459],
       [ 871, 1022],
       [ 130,  791],
       [ 769, 1062],
       [ 385,  576],
       [ 955,  975],
       [ 459,  772],
       [  21,  273],
       [ 747,  795],
       [ 474,  532],
       [ 510,  985],
       [ 699,  969],
       [ 189,  875],
       [ 957, 1007],
       [ 831,  961],
       [ 646,  666],
       [ 840, 1006],
       [ 387,  987],
       [ 315,  328],
       [ 241, 1017],
       [ 564,  903],
       [  91,  457],
       [ 955, 1025],
       [ 508,  542],
       [ 205,  285],
       [1025, 1030],
       [ 565,  670],
       [ 702,  919],
       [ 161,  362]]),
       ...,
       array([
       [ 815,  949],
       [ 400, 1039],
       [1056, 1071],
       [ 459,  928],
       ...
       [ 961,  987],
       [ 489,  874],
       [ 784,  887],
       [ 245,  420],
       [1062, 1071]])]

Notice that each matrix draw is a new draw. Since they will serve the purpose of temporal features computation, it comes that each DecisionTree will have different features.

Three types of features are extracted:

Mean
Standard Deviation
Slope

For each interval i = (start, end) belonging to I the space of Intervals defined above:

Where 𝛽 is the slope of the least squares regression line fitted on the interval i.

For example, 32 intervals per DecisionTree has been drawn in my example, and each estimator has effectively 96 features.

The number of intervals is automatically decided depending of the length of your nested time series windows:

print(clf["classify"].estimators_[0].n_features_)
>>> 96

For the curious, the feature space construction can be found as code here in _transform method.

Feature Importance

Random Forest indicates a feature importance that reflect how much the model used a feature during training, it doesn’t take into account features interaction for example. It is a big shortcut to root cause the target using the feature importance but it can be useful to understand the model’s logic and make a step toward explainability.

As a quick reminder, how is the feature importance computed in random forest ? Feature importance is a technique that assign score to input features based on how useful they are at predicting a target variable.

Random forest is composed by a set of decision trees which are composed of internal nodes and leaves. On each node a feature is selected to make decision on how to split the data set. The features are selected with some criterion which can be variance reduction for regression and mini impurity for classification. The decrease in impurity on the split is gathered for each feature and the highest one is selected for the node. For each feature we can collect how on average it decreases the impurity. At the end, the feature importance is the average over all trees. [cf Random Forest Feature Importance Computed in 3 Ways with Python]

The problem in TimeSeriesForestClassifier is that each Tree owns a different feature space. It is no longer relevant to average over all trees, so a new aggregation is needed.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

/!\ Today, TimeseriesForestClassifier has a property “feature_importance_” which is inherited from sklearn BaseForest. This feature importance is computed as a classic random forest does. (cf https://github.com/scikit-learn/scikit-learn/blob/1.1.2/sklearn/ensemble/_forest.py — tag: 1.1.2)

/!\ There is an implementation of temporal feature importance in BaseTimeSeriesForest in sktime.series_as_features.base.estimators but TimeseriesForestClassifier is inheriting from sktime.series_as_features.base.estimators.interval_base._tsf.py which does not have feature_importance_ on v1.13.2. (cf https://github.com/alan-turing-institute/sktime/blob/v0.13.2/sktime/series_as_features/base/estimators/interval_based/_tsf.py)

from sktime.series_as_features.base.estimators import BaseTimeSeriesForest

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Considering a TimeSeriesClassifier with 10 trees, feature importance can be computed for each feature from each tree and information relative to that feature importance can be stored.

An interesting approach is to sum all feature importances by categories (e.g. mean, std, slope) on each time stamp. Here we introduce the concept of temporal feature importance. Each feature importance can be thought of as a rectangle of length equal to the size of the feature interval and height equal to the said feature importance. All feature importances can be timely placed over the length of the concatenated frames and summed.

Temporal Feature Importance — Tetris Analogy

In order to retrieve a normalized feature importance over all trees and time stamp, it remains to divide this sum by the number of trees and the number of intervals.

With the number of trees N_T, the number of interval N_I and fi_{cat, i j}, start_{i, j}, end_{i, j} the feature importance, and the range relative to the feature extracted on a given tree and sample.

It gives the following result on the use case :

import numpy as np
import pandas as pd
from sktime.classification.interval_based import TimeSeriesForestClassifier
from sktime.series_as_features.base.estimators import BaseTimeSeriesForest
from sktime.transformations.panel.compose import ColumnConcatenator
from sklearn.pipeline import Pipeline


class CustomTimeSeriesForestClassifier(TimeSeriesForestClassifier):
    feature_types = ["mean", "std", "slope"]

    def _extract_feature_importance_of_feature_type_from_tree_feature_importance(
            self, tree_feature_importance: np.array, feature_type: str
) -> np.array:
    """
    Extracting the feature importance corresponding from a feature type (eg. "mean", "std", "slope") from tree
    feature importance
    ----------
    tree_feature_importance : array-like of shape (n_features_in,)
        The feature importance per feature in an estimator, n_intervals x number of feature types
    feature_type : str
        feature type belonging to self.feature_types
    Returns
    -------
    self : array-like of shape (n_intervals,)
        Feature importance corresponding from a feature type
    """
        feature_index = np.argwhere(
            [feature_type == feature_type_recorded for feature_type_recorded in self.feature_types]
        )[0, 0]

        feature_type_feature_importance = tree_feature_importance[[
            interval_index + feature_index for interval_index in range(
                0, len(tree_feature_importance), len(self.feature_types)
            )]
        ]

        return feature_type_feature_importance

    @property
    def feature_importances_(self, **kwargs) -> pd.DataFrame:

        all_importances_per_feature = {
            "mean": np.zeros(self.series_length),
            "std": np.zeros(self.series_length),
            "slope": np.zeros(self.series_length),
        }

        for tree_index in range(self.n_estimators):
            tree = self.estimators_[tree_index]
            tree_importances = tree.feature_importances_
            tree_intervals = self.intervals_[tree_index]
            for feature_type in self.feature_types:
                feature_type_importances = \
                    self._extract_feature_importance_of_feature_type_from_tree_feature_importance(
                        tree_importances, feature_type
                    )
                for interval_index in range(self.n_intervals):
                    interval = tree_intervals[interval_index]
                    all_importances_per_feature[feature_type][interval[0]:interval[1]] += feature_type_importances[
                        interval_index]

        temporal_feature_importance = pd.DataFrame(all_importances_per_feature) / self.n_estimators / self.n_intervals
        return temporal_feature_importancesteps = [('concatenate', ColumnConcatenator()),
         ('classify', CustomTimeSeriesForestClassifier())]
clf = Pipeline(steps)
clf.fit(windowed_chunk_time_series_dataframe, windowed_chunk_time_series_dataframe_label.values.reshape(-1))

temporal_feature_importance = clf["classify"].feature_importances_separators = range(
    0,
    clf["classify"].series_length,
    len(windowed_chunk_time_series_dataframe.iloc[0, 0])
)

ax = temporal_feature_importance.plot(figsize=(20, 10))
for separator in separators:
    ax.vlines(
        separator,
        temporal_feature_importance.min().min(),
        temporal_feature_importance.max().max(),
        color='r',
        alpha=0.1
    )

fig = ax.get_figure()

fig.savefig('./feature_importance.png')

Feature Importance in Multivariate use case with 10 series

Feature Importance in Multivariate use case with 3 series

However, creating a temporal feature importance is not the end. Remember that, in case of multivariate time series problem, the first step in our pipeline was ColumnConcatenator(). It could be interesting to convert this timestamp feature importance to a dimensional feature importance.

Feature Importance against time in Multivariate use case with 3 series — Features emphasized

An interesting way to visualize it could be to aggregate those temporal feature importance on timestamp belonging to original multivariate series.

def feature_importance_in_dim(
        time_series_forest_classifier: CustomTimeSeriesForestClassifier, nb_of_series: int
) -> pd.DataFrame:

    temporal_feature_importance = time_series_forest_classifier.feature_importances_
    separators = range(
        0,
        time_series_forest_classifier.series_length,
        nb_of_series
    )
    feature_importance_per_series_dict = {
        col: [
            temporal_feature_importance.loc[start:start + time_series_forest_classifier.series_length, col].mean()
            for start in separators
        ]
        for col in temporal_feature_importance
    }
    feature_importance_per_series_df = pd.DataFrame(feature_importance_per_series_dict)

    return feature_importance_per_series_df


feature_importance_df = feature_importance_in_dim(clf["classify"])
feature_importance_df.index = windowed_chunk_time_series_dataframe.columns
feature_importance_df.plot.bar(figsize=(20, 10))

But it doesn’t reflect the fact that some feature importance was due to the mean, std or slope of an interval over several multivariate series. That means that it accounts for the mean, std or slope of the change from one series to its (several ?) next right concatenated.

As a final thought, this can be extended by creating virtual feature such as “features_0_&_feature_1”, “feature_1_&_feature_2”, “feature_0_&_feature_1_&_feature_2”. Which brings our feature space size to n(n-1)/2 with n the number of multivariate series we are working with. It can become less explainable than the temporal feature importance.

Retrieve all the code used above here:

Special thanks to Deirdrée Polak and An Truong for the review !