Fitting a univariate time series model to a cluster of series

Priyanka Banik
Walmart Global Tech Blog
9 min readMar 28, 2022

An extension of univariate time series modeling to multiple series with similar temporal pattern

Image Source

Introduction

We often need to generate simultaneous forecasts for a large number of univariate time series. To give an example, in Walmart, we need to generate forecasts for thousands of time series for planning and resource allocation purposes. The first approach that comes to mind is to fit individual univariate time series models for each of these time series and use these individual models to generate the future forecasts. But, fitting and validating complex time series models like SARIMAX, Holt-Winters, FBProphet for a large set of univariate data is not only time consuming, but also not practical, especially for large retail or e-commerce businesses with a broad range of products. In addition to the time complexity, monitoring and maintaining these many models in production can be troublesome and expensive. One possible ‘semi-supervised’ approach can be to create clusters of ‘similar’ time series and then fit one complex model for each cluster. Now, fitting a tree-based or linear model based regressor to a cluster of time series is straightforward — we need to concatenate the time series data falling in one cluster and include a one-hot-encoded identifier in the feature vector to adjust for the individual effects. But often these regression models, without any hand-curated time series specific featurization, fail to capture the usual time series characteristics, for example, the trend, seasonality, autocorrelation etc. On the other hand, fitting a time series model to multiple univariate series is not straightforward. There are multivariate time series modelling techniques like Vector Autoregression (VAR) but that is often not the most optimal choice because of the strong parametric assumptions and less scalability and flexibility. In this blog, we discuss how we can extend any univariate time series modelling technique to fit a cluster of multiple time series whose temporal patterns are ‘similar’ in nature while retaining their individual characteristics. By applying this methodology along with time series clustering, we have managed to reduce the model fitting time by 20x and number of models by 500x while not compromising much on the forecasting accuracy.

Clustering of time series data

Time series clustering is a prerequisite for the methodology that we are proposing in this blog. Hence, we want to discuss why clustering can be a good approach when dealing with many univariate time series and what are some of the possible methods for time series clustering.

Figure 1: input data columns

The goal of clustering for our problem is to group those time series which have ‘similar’ temporal patterns so that instead of individual complex models, one single model with similar complexity can capture the ‘common’ temporal pattern for all the series in a cluster. Now in most of the cases, the data is stored in long-format, i.e., each row represents the time series value for a (identifier, time) pair, e.g., in Figure 1, ‘id_nbr’ is our series identifier and month is indicating the time-point and monthly time series value is stored in ‘target_var’ column. Assuming all the due pre-processing of the data has been done and for each identifier we have uniform length time series, we can pivot the data so that each time series will be represented in the rows, i.e., we will have a data frame with number of columns equal to number of time points, say m, and the number of rows will be equal to the number of unique identifiers, say N. We can consider this N X m dataset as N number of m-dimensional data points and try to cluster ‘similar’ data points together. To group similar time series, we have used Bisecting k-means algorithm with simple Euclidean distance, as for our use-case, we are dealing with last 4 years monthly data (approx. 50 data points) and so, the curse of dimensionality does not play a big role.

Figure 2: A set of time series falling into a cluster are plotted and colour-coded differently. The black dashed line represents the median time series of that cluster

In Figure 2, we showcase one example of a cluster obtained, where the x-axis represents the time (at monthly frequency) and the y-axis represents the value of the time series and individual time series in the cluster are color-coded differently. Some other choices for time series clustering are DTW-based distances for non-synchronised time series, PCA-based dimensionality reduction for high dimensional time series clustering (e.g., see here) etc.

Fitting a Time Series model to a cluster of time series

Figure 3: Workflow of fitting univariate time series model for a cluster of series

Suppose n-many univariate time series are grouped in a cluster whose temporal patterns are ‘similar’ in nature, as shown in Figure 2. Let us denote these n time series as {Y₁(t): t ∈ {T₁,…,Tₘ}},…, {Yₙ(t):t ∈ {T₁,…,Tₘ}}. Instead of fitting separate univariate models to each of these n series, we can compute a ‘representative’ time series for this cluster. An obvious choice for this can be the cluster centroid, i.e. the time series obtained by taking the median across the identifiers at each time point, denoted by {Yₚ(t) = Q₂(Y₁(t),…,Yₙ(t)):t ∈ {T₁,…,Tₘ}}, where Q₂ is the median operator. The time series {Yₚ(t)}, say the median time series, is a univariate time series representing the temporal pattern of the n series in the cluster and hence, we can fit any univariate time series model to it.

Once a model is fitted to the univariate median time series, the next step is to generate the forecasts for all the time series in the cluster. The forecasts generated from the median univariate time series model (denoted by {Y*ₚ(t)}) are not enough as each individual series has their own adjustments from the median time series, as shown in Figure 2. To address for these individual adjustments, we first compute the adjustment series by {Dᵢ(t) = Yᵢ(t)-Yₚ(t)}. As the major temporal components, e.g., trend, seasonality etc. are already taken care of by the univariate time series model fitted to the median series, we can use a very simplistic approach to predict or forecast for the adjustment series. For example, we can simply use previous year same month values as the forecasted adjustments or, build a simple linear model to generate the adjustment forecasts {D*ᵢ(t)}. Finally, the forecasts for individual time series in a cluster are given by {Y*ᵢ(t) = Y*ₚ(t) + D*ᵢ(t)}. The flowchart of the explained methodology is presented in Figure 3.

Example use-case with code snippets

Now let’s apply the above-mentioned methodology to a real dataset.

As shown in Figure 1, the input data contains monthly target values of time series falling into a single cluster. The column ‘target_var’ will serve as our target variable and we have 56 months of data for each of the 244 time series. Before jumping to model building, we divide this dataset into train and test splits so that we can evaluate our methodology later. As this is a forecasting problem, we need to divide the dataset temporally instead of a random split.

import pandas as pd
import numpy as np
from fbprophet import Prophet
train_range = [‘2016–02–01’, ‘2019–02–01’]
test_range = [‘2019–03–01’, ‘2020–02–01’]
date_col = ‘month’
target_col=‘target_var’
features_to_use = [‘month_of_year_0’, ‘month_of_year_1’, ‘month_of_year_2’, ‘month_of_year_3’, ‘month_of_year_4’, ‘month_of_year_5’, ‘month_of_year_6’, ‘month_of_year_7’, ‘month_of_year_8’, ‘month_of_year_9’, ‘month_of_year_10’]df_train = df[(df[date_col]>=train_range[0]) & (df[date_col]<=train_range[1])]df_test = df[(df[date_col]>=test_range[0]) & (df[date_col]<=test_range[1])]

Next, we compute the representative time series for this cluster by taking monthly aggregation across all the individual series.

agg_df_train = df_train[[target_col, date_col]+ features_to_use].groupby(date_col).median().reset_index()

The agg_df_train has two columns; one is ‘month’ representing the time point and ‘target_var’ is the median aggregated target variable. As this is a univariate time series, we can fit any univariate model to it. For our example, we have used FBProphet for univariate time series modeling.

agg_df_train.rename({‘month’:’ds’, ‘target_var’:’y’}, axis=1,inplace=True)model = Prophet(
growth=’linear’,
seasonality_mode=’additive’,
yearly_seasonality=False,
weekly_seasonality=False,
daily_seasonality=False
)
if features_to_use != None:
for i in features_to_use:
model.add_regressor(i)
model.fit(agg_df_train)

We can use this FBProphet model to generate forecasts on the test period for the median time series.

agg_df_test = df_test[[target_col, date_col]+ features_to_use].groupby(date_col).median().reset_index()
agg_df_test.rename({‘month’:’ds’, ‘target_var’:’y’}, axis=1,inplace=True)
predictions = model.predict(df=agg_df_test.drop(‘y’, axis=1))

But these forecasts are not able to address the individual adjustment factors for each time series, for that, we first compute the adjustment series (‘delta_val’) in the training period, and then build a simple linear model to forecast for the test period.

df_train = df_train.merge(agg_df_train[[‘ds’, ‘y’]], right_on=’ds’, left_on=date_col, how=’left’)df_train[‘delta_val’] = df_train[target_col] — df_train[‘y’]df_test = df_test.merge(predictions[[‘ds’, ‘yhat’]], left_on=date_col, right_on=’ds’, how=’left’)df_actual_all = pd.concat([df_train, df_test])def get_linear_model_pred(train_test_df, test_dat):
from sklearn.linear_model import LinearRegression
train_test_df = train_test_df.sort_values(date_col).reset_index(drop=True)
train_test_df['t'] = np.arange(0, train_test_df.shape[0])
train_test_df['t2'] = train_test_df['t']**2
train_df = train_test_df.loc[~train_test_df[date_col].astype(str).isin(test_dat['ds'].astype(str).values.tolist())].reset_index(drop=True)
test_df = train_test_df.loc[train_test_df[date_col].astype(str).isin(test_dat['ds'].astype(str).values.tolist())].reset_index(drop=True)
lm_obj = LinearRegression().fit(train_df[features_to_use+[‘t’, ‘t2’]],train_df[[‘delta_val’]])
test_delta_pred = lm_obj.predict(test_df[features_to_use+[‘t’, ‘t2’]])
return pd.DataFrame({'month': test_df[date_col].values,'test_delta_pred' : test_delta_pred[:,0]})
test_df_delta = df_actual_all.groupby([‘id_nbr’]).apply(lambda x: get_linear_model_pred(x, agg_df_test)).reset_index(drop=False)df_test = df_test.merge(test_df_delta[[‘id_nbr’, date_col, ‘test_delta_pred’]], on=[‘id_nbr’, date_col], how=’left’)

Finally, the forecasts for all the 244 time series are obtained by adding the individual adjustment predictions with the median prediction.

df_test[‘final_prediction’] = df_test[‘yhat’] +df_test[‘test_delta_pred’]

Results

We have evaluated the above methodology for a cluster of 244 univariate time series as mentioned in previous section. Since this is a regression set up, we have used mean absolute percentage error or MAPE as error metric. The lower the MAPE the better are the forecasts generated. In our use-case, MAPE has been computed for individual time series over the test period (12 months), so for each of the 244 time series, we will have one MAPE value indicating the forecasting performance for that series. In Table 1, we have reported 3 quantiles of the MAPE values to show-case the performance of fitting FBProphet at cluster level in comparison with (a) fitting individual FBProphets to 244 univariate series and (b) fitting a Random Forest model (with hyper-parameters chosen by cross-validation) to the entire cluster data with one-hot encoding for individual series to adjust for the individual effects and month one-hot encoding as features.

Table 1: Distribution of MAPE computed for different forecasting methodologies

Clearly, the FBProphet model at cluster-level, by taking the basic time series components into account, has outperformed the Random Forest model without any hand-curated time-series specific features (e.g., lagged variables etc.). If we compare with fitting individual FBProphet models, the accuracy is similar. On the other hand, fitting FBProphet at a cluster level takes only 3.2 seconds for a set of 244 time series (including the adjustment modelling) whereas, fitting separate FBProphets for 244 individual time series takes 2.24*244 = 546.6 seconds! By extending the same to more than 50 thousand time series that we have handled in one of the forecasting problems in Walmart, we have saved a fortune on time and resources yet hardly losing much in terms of forecasting accuracy.

Conclusion

Dealing with multiple univariate time series is not rare in practice. Analysts are trying to generate simultaneous forecasts for multiple univariate time series in various fields, e.g., finance, public health, e-commerce, supply chain, meteorology etc. to name a few. This semi-supervised approach of grouping time series with similar temporal pattern together and then fitting one univariate time series model to each cluster can reduce the model training and inference time significantly along with the complexity and cost for maintaining vast number of models in production. By extending any univariate time series model to a cluster of time series of similar temporal pattern, this methodology brings a lot of flexibility into the picture and hence, as displayed in this blog, this methodology can improve the forecasting accuracy significantly as compared to fitting generic machine learning models.

I hope this blog sets the foundation for this methodology and will encourage the readers to apply this on other use-cases.

Acknowledgement: I am thankful to the EBS NexTech data science team at IDC for reviewing this methodology while working collaboratively on one of the important forecasting projects in Walmart.

References

  1. Time-series clustering — A decade review: ​​https://www.sciencedirect.com/science/article/abs/pii/S0306437915000733
  2. Dynamic time warping: https://en.wikipedia.org/wiki/Dynamic_time_warping
  3. A dimensionality reduction technique for efficient time series similarity analysis: https://www.sciencedirect.com/science/article/abs/pii/S0306437907000518
  4. Bisecting K-means: https://medium.com/@afrizalfir/bisecting-kmeans-clustering-5bc17603b8a2
  5. FBProphet: https://facebook.github.io/prophet/docs/quick_start.html

--

--