From Data to Dollars — A Deep Dive in Time-Series Forecasting II

10 min readJul 11, 2023

First understand your limits. Then break past them.

In the first part, we set a baseline metric as a benchmark for model selection. Now, let’s explore two widely used regression modeling algorithms: XGBoost and Light GBM. But before we delve into the code and analyze the outcomes, let’s first understand the gradient boosting model that forms the foundation for both of these models.

Gradient Boosting

United we stand.

The core idea of this algorithm revolves around constructing models in a sequential manner. Each subsequent model aims to minimize the error of the previous model by propagating and reducing the residuals across the chain.

To illustrate the process, we begin by randomly selecting observations, ensuring that each observation has an equal probability of being chosen. As we progress through the chain, observations with higher errors are assigned greater weights, while those correctly handled receive lower weights. Consequently, we effectively minimize errors for the more challenging observations while maintaining or improving performance on the less complex ones.

The error profile typically would look like the one shown below.

XGBoost Regressor

Regularization - The Bane of Overfitting

XGBoost is a highly optimized machine learning library that excels in gradient boosting. It offers several noteworthy features, including:

Regularization Techniques: XGBoost incorporates both lasso and ridge regularization methods, which enhance the model’s ability to generalize compared to traditional Gradient Boosting.
Parallelizability: The algorithm supports parallelization within a single tree, allowing for the simultaneous generation of different nodes. However, it does not facilitate training multiple trees in parallel.
Control Over Tree Depth: With XGBoost, we have the flexibility to specify a “max_depth” parameter, which serves as the stopping criterion for branch splitting during the tree construction process.
Efficient Utilization of Compute Resources: XGBoost leverages the processing capabilities of available compute resources through cache awareness. It achieves this by allocating internal buffers in each thread to store gradient statistics, resulting in optimized performance.
Insightful Feature Importance Analysis: By accessing the “feature_importances_“attribute, XGBoost enables a comprehensive understanding of the relative importance of different features in making predictions, contributing to better model interpretation.

XGBoost supports both classification and regression tasks, with the only significant difference being the choice of loss function definitions. For our forecasting purposes, we will be utilizing the regressor.

LightGBM Regressor

Horizontal growth for stability. Vertical growth to touch the skies. And the worst aspect about this predicament is we can’t have both.

Like XGBoost, LightGBM too is a gradient-boosting machine learning library. The salient features of the library are as follows:

LightGBM carries out leaf-wise vertical growth that allows for more loss reduction but could also be prone to overfitting.
Uses histogram binning which allows it to handle large datasets with high dimensionality.
Faster speeds and accuracy with less memory overload.
We can specify the max_depth parameter as the stopping criteria for splitting of the branch.
We can access the feature_importances_ allowing a better understanding of which features play into making the predictions.

LightGBM comes in both classification and regression flavors with the only major change being in the loss function definitions. We’ll be using the regressor for our forecasting purposes.

The Experiment

Here we go again.

As mentioned earlier in this series, we will train individual models for each store-item combination. In this specific set of experiments, our focus will be on exploring the XGBoost and LightGBM models.

Imports

Standing on the shoulder of giants

import numpy as np
import pandas as pd
import time
from datetime import datetime, timedelta
from dateutil.relativedelta import relativedelta
import os
from ast import literal_eval
import lightgbm as lgb
import xgboost as xgb
from tqdm import tqdm
import warnings
import matplotlib.pyplot as plt

warnings.filterwarnings('ignore')

Evaluation Functions

Consistency - A Virtue for Mules

To keep things consistent, we’ve used the same evaluation metrics throughout our experiments.

from sklearn.metrics import mean_squared_error,mean_absolute_error

def smape(y_true, y_pred):
 """
 Defines the smape (Symmetric Mean Average Percentage Error) metric
 Args:
  1) True values for forecast period
  2) Model Forecasts
 Returns:
  sMAPE value
 """
  mask = ~((y_true == 0) & (y_pred == 0))
  y_true = y_true[mask]
  y_pred = y_pred[mask]
  return np.mean(np.abs((y_true-y_pred))/((np.abs(y_true)+np.abs(y_pred))/2))*100

def extract_metrics_from_predicted(y_true, y_pred):
 """
 Extract all relevant metrics (MAE, RMSE, sMAPE) from the forecasted values
 Args:
  1) True values for forecast period
  2) Model Forecasts
 Returns:
  Dictionary containing:
   a) RMSE
   b) MAE
   c) sMAPE
 """
  mse = mean_squared_error(y_true, y_pred)
  rmse = np.sqrt(mse)
  mae = mean_absolute_error(y_true, y_pred)
  smape_val = smape(y_true, y_pred)
  return {"rmse":rmse, "mae":mae, "smape":smape_val}

Feature Engineering

You must learn to see what you are looking at. But first create a perspective.

def create_date_time_features(df):
    """
    Create date based time series features based on time series index.
  Args:
  1) Dataframe with Timeseries input date and values
  Returns:
  1) Dataframe with date based features such as:
      i) month
     ii) day of the month
    iii) day of the year
     iv) year
      v) quarter
     vi) weekend
    vii) start of the month
    viii) end of the month
        ix) season
    """

    df = df.copy()
    df['date'] = pd.to_datetime(df['date'])
    df['month'] = df.date.dt.month
    df['day_of_month'] = df.date.dt.day
    df['day_of_year'] = df.date.dt.dayofyear
    df['week_of_year'] = df.date.dt.weekofyear
    df['day_of_week'] = df.date.dt.weekday
    df['year'] = df.date.dt.year
    df['quarter'] = df.date.dt.quarter
    df["is_wknd"] = (df.date.dt.weekday > 4).astype(int)
    df['is_month_start'] = df.date.dt.is_month_start.astype(int)
    df['is_month_end'] = df.date.dt.is_month_end.astype(int)
    # 0: Winter ; 1: Spring ; 2: Summer ; 3: Fall
    df["season"] = np.where(df.month.isin([12,1,2]), 0, 1)
    df["season"] = np.where(df.month.isin([6,7,8]), 2, df["season"])
    df["season"] = np.where(df.month.isin([9, 10, 11]), 3, df["season"])
    return df

def lag_features(df, lags):
 """
 Create lag features based on array of lag values
 Args:
  1) Dataframe with timeseries data
  2) Array with lag values
 Returns:
  1) Dataframe with lag values
 """

    for lag in lags:
        df['sales_lag_' + str(lag)] = df['sales'].transform(lambda x: x.shift(lag))
    return df

def roll_mean_features(df, windows):
 """
 Lag value rolling mean feature creation
 Args:
 1) Dataframe with timeseries data
 2) Size of window

 Returns:
 1) Dataframe with rolling mean lag features 
 """

 for window in windows:
  df['sales_roll_mean_' + str(window)] = df['sales'].\
                                                       transform(lambda x: x.shift(1).rolling(window=window, min_periods=10, win_type="triang"). \
                            mean())
    return df

def ewm_features(dataframe, alphas, lags):
 """
 Exponentially weighted mean values based on lags
 Args:
  1) Dataframe with timeseries data
  2) Smoothing factor for EWM
  3) Array with lag values

 Returns:
  1) Dataframe with exponentially weighted mean features
 """
 
 for alpha in alphas:
  for lag in lags:
   dataframe['sales_ewm_alpha_' + str(alpha).replace(".", "") + "_lag_" + str(lag)] = \
        dataframe['sales'].transform(lambda x: x.shift(lag).ewm(alpha=alpha).mean())
    return dataframe

def create_features(df):
 """
 Create all features identified for the timeseries data
 Args:
  1) Dataframe with timeseries data (date and sales data)
 Returns:
  1) Dataframe with all features created
 """
  
 df = create_date_time_features(df)
  df = lag_features(df, [91, 98, 105, 112, 119, 126, 182, 364, 546, 728])
  df = roll_mean_features(df, [365, 546])

  alphas = [0.95, 0.9, 0.8, 0.7, 0.5]
  lags   = [91, 98, 105, 112, 182, 273, 364, 546, 728]
  df = ewm_features(df, alphas, lags)

  df_final = pd.get_dummies(df, columns=['month', 'day_of_week', "quarter", "year", "is_wknd", "is_month_start", "is_month_end", "season"])
  return df_final

Model Training and Evaluation

Together we can do great things.

For XGBoost we can utilize the following function

def xgboost_run(time_df, lb, lr, booster = 'gblinear', return_importance = False):
 """
 Train and evaluate the XGBoost model
 Args:
  1) Timeseries data
  2) Look back value
  3) Learning rate (eta)
  4) Booster to use
  5) Whether to return feature importances
 Returns:
  1) Test values
  2) Forecasted values
  3) Training time
  4) [optional] Feature importances
 """

 feature_df = create_features(time_df)
  
  feature_df.dropna(inplace=True)
  feature_df.drop('date', axis=True, inplace=True)

  n_test = len(np.arange(datetime.strptime(time_df.date.max(), "%Y-%m-%d")-relativedelta(months = 1), 
             datetime.strptime(time_df.date.max(), "%Y-%m-%d"), timedelta(days = 1)).astype(datetime))
  train_X, train_y = feature_df.iloc[:-n_test].drop(['sales'], axis=1).reset_index(drop=True), feature_df['sales'].iloc[:-n_test].reset_index(drop=True)
  test_X, test_y = feature_df.iloc[-n_test:].drop(['sales'], axis=1).reset_index(drop=True), feature_df['sales'].iloc[-n_test:].reset_index(drop=True)
  
  start_time = time.time()

  xgb_reg = xgb.XGBRegressor(n_estimators=1000, max_depth=5, eta=lr, 
                             objective="reg:squarederror", seed=42,
                             tree_method='auto', 
                             booster = booster)
  xgb_reg.fit(train_X, train_y)
  total_time = time.time() - start_time

  predicted = xgb_reg.predict(test_X).tolist()
  actual = test_y.values.tolist()
 if booster != 'gblinear' and return_importance:
  return actual, predicted, total_time, xgb_reg.feature_importances_
  return actual, predicted, total_time

For LightGBM we can use the following function

def lgbm_smape(preds, train_data):
 """
 Return evaluation metric for optimisation of LightGBM model training
 Args:
  1) Forecasted values
  2) Training data
 Returns:
  1) Metric name for evaluation
  2) Metric value for evaluation
  3) Bool value
 """
 labels = train_data.get_label()
 smape_val = smape(preds, labels)
 return 'SMAPE', smape_val, False

def lgbm_run(time_df, lb, lr, return_importance = False):
 """
 Train and evaluate the LightGBM model
 Args:
  1) Timeseries data
  2) Look back value
  3) Learning rate (eta)
  4) Whether to return feature importances
 Returns:
  1) Test values
  2) Forecasted values
  3) Training time
  4) [optional] Feature importances
 """
 lgb_params = {'num_leaves': 5,
              'learning_rate': lr,
              'feature_fraction': 0.8,
              'max_depth': 5,
              'verbose': 0,
              'num_boost_round': 1000,
              # 'early_stopping_rounds': 200,
              'nthread': -1}

 feature_df = create_features(time_df)
  
  feature_df.dropna(inplace=True)
  feature_df.drop('date', axis=True, inplace=True)

 n_test = len(np.arange(datetime.strptime(time_df.date.max(), "%Y-%m-%d")-relativedelta(months = 1), 
             datetime.strptime(time_df.date.max(), "%Y-%m-%d"), timedelta(days = 1)).astype(datetime))
  train_X, train_y = feature_df.iloc[:-n_test].drop(['sales'], axis=1).reset_index(drop=True), feature_df['sales'].iloc[:-n_test].reset_index(drop=True)
  test_X, test_y = feature_df.iloc[-n_test:].drop(['sales'], axis=1).reset_index(drop=True), feature_df['sales'].iloc[-n_test:].reset_index(drop=True)

 cols=list(test_X.columns)

  lgbtrain = lgb.Dataset(data=train_X, label=train_y, feature_name=cols)
  
  start_time = time.time()
  model = lgb.train(lgb_params, lgbtrain,
        valid_sets=[lgbtrain],
        num_boost_round=lgb_params['num_boost_round'],
        feval=lgbm_smape,
        verbose_eval=False)
  total_time = time.time() - start_time
  
  predicted = model.predict(test_X, num_iteration=model.best_iteration).tolist()
  actual = test_y.values.tolist()
 if return_importance:
  return actual, predicted, total_time, model.feature_importances_
  return actual, predicted, total_time

Training Loop

What makes the destination worthwhile is the journey.

def update_dictionary(res_dict, store, item, actual, predicted, rmse, mae, smape, training_time):
  """
 Update the results dictionary
 Args:
  1) Store
  2) Item
  3) Actual Values in the forecast period
  4) Predicted Values in the forecast period
  5) RMSE Value
  6) MAE Value
  7) sMAPE Value
  8) Training time in seconds
 Returns:
 None
 """

 res_dict['store'].append(store)
  res_dict['item'].append(item)
  res_dict['actual'].append(actual)
  res_dict['predicted'].append(predicted)
  res_dict['rmse'].append(rmse)
  res_dict['mae'].append(mae)
  res_dict['smape'].append(smape)
  res_dict['training_time'].append(training_time)
  return

# Read the train and test data
train_df = pd.read_csv(TRAIN_PATH)
test_df = pd.read_csv(TEST_PATH)

# Get store item combinations and store in an ordered tuple
store_item_combinations = train_df[['store', 'item']].drop_duplicates().sort_values(by = ['store', 'item'])
store_item_tup = [(store, item) for store, item in zip(store_item_combinations['store'].to_list(), store_item_combinations['item'].to_list())]

# Initialize a results dictionary
res_dict = {'store': [], 'item': [], 'actual': [], 'predicted': [], 'mae': [], 'rmse': [], 'smape': [], 'training_time': []}

For the XGBoost model with the booster choice set as “gbtree,” we can utilize the following code. If you wish to use a different booster, you can easily swap the specific choice accordingly.

return_importance = True
booster = 'gbtree'

# Loop over all the tuples
for store, item in tqdm(store_item_tup):
  # Get the timeseries for a particular store-item combination
 temp_df = train_df[(train_df['store']==store) & (train_df['item']==item)].copy()
  temp_df.drop(['store', 'item'], axis=1, inplace=True)

  try:
  # Train and evaluate the model
    actual, predicted, total_time, feature_importances = xgboost_run(temp_df, lb=5, lr=0.1, booster = booster, return_importance = return_importance)
  except Exception as e:
    print(f'Store: {store}; Item: {item} error encountered. {e}')
    predicted = []
    actual = []
  
  # Extract the relevant metrics from the predicted values
  error_mat = extract_metrics_from_predicted(actual, predicted)
  
  # Update the results dictionary
  update_dictionary(res_dict, store, item, actual, predicted, error_mat['mae'], error_mat['rmse'], error_mat['smape'], total_time)
 if return_importance:
  res_dict['feature_importances'].append(feature_importances)

# Write the results
pd.DataFrame(res_dict).to_csv(f"{WRITE_PATH}/results_{booster}.csv", index=False)

For LightGBM, the training loop looks virtually the same except for replacing the xgboost_run with the lgbm_run

Overall Comparison

Taking stock of the situation

As discussed in the preceding article, we’ll be focusing on the weighted value of sMAPE across all store item combinations.

def get_overall_performance(smape_list, true_values_list, training_time):
 """
 Compute the performance in terms of the weighted and unweighted sMAPE and training time
 Args:
  1) sMAPE list of all store-item combinations
  2) True values during the forecast period
  3) Training time across all store-item combinations
 Returns:
  1) Unweighted mean of sMAPE
  2) Weighted mean of sMAPE
  3) Mean training time
 for all store-item combinations 
 """

  summed_demand = np.sum(np.asarray(true_values_list), axis = 1)
  summed_demand_ratio = summed_demand/np.sum(summed_demand)
  weighted_smape = np.multiply(summed_demand_ratio, np.asarray(smape_list), dtype = np.float64)
  weighted_smape = np.sum(weighted_smape)
  unweighted_smape = np.mean(np.asarray(smape_list))
  return unweighted_smape, weighted_smape, np.mean(np.asarray(training_time))

result_df = pd.read_csv(f"{WRITE_PATH}/results_{booster}.csv")

unweighted_smape, weighted_smape, average_training_time = get_overall_performance(result_df['smape'].to_list(),result_df['actual'].apply(literal_eval).to_list(), result_df['training_time'].to_list())

print("unweighted_smape\t", unweighted_smape)
print("weighted_smape\t", weighted_smape)
print("average_training_time\t", average_training_time)

Results and Conclusions

Results are transitory. And paths - varied. The only constant in victory, Is a will that tarried.

In addition to our regular experiments, we conducted a supplementary analysis to examine the feature importance of the XGBoost model using the “gbtree” booster. This analysis uncovered fascinating insights about our prepared features.

We observed that the season variable has a significant impact on our sales forecasts, with winter and summer emerging as highly influential factors. Additionally, weekends were found to play a crucial role in the predictions.

The following plots show sMAPE per Item.

The following plots show the mean sMAPE per store. For both models, SKU three and five stand at extreme min and max error values, and this behavior remains consistent across both models.

We can evaluate the performance of models based on the following error metrics:

Upon comparing the performance of the two models using different error metrics, it is evident that XGBoost outperforms LGBM in terms of mean absolute error (MAE) and root mean square error (RMSE). These metrics indicate that XGBoost achieves lower errors and provides better accuracy in predictions.

However, when we consider the symmetric mean absolute percentage error (SMAPE), a metric that captures the relative percentage difference between the predicted and actual values, both XGBoost and LGBM exhibit similar performance.

Based on these findings, it appears that XGBoost might be a more time-efficient choice when training time is a critical factor, while LGBM might offer slightly better overall accuracy if maximizing accuracy is the primary consideration. However, it’s worth noting that the differences in performance between the two models are relatively small and may vary depending on specific use cases.

Although we have made progress from our baseline results, as outlined in our previous blog post, our journey to identify the optimal model for our specific use case is far from complete. There is still much to explore and discover. Therefore, let’s reconvene in our next blog post, where we will delve into yet another model and continue our pursuit of finding the best solution.

For more updates from Quantiphi follow us on Linkedin, Twitter, Instagram and YouTube.

(With contributions from Akhil Ram Adapa (Senior Architect — Machine Learning), Mohdsaifullah Sadiq Inamdar (Machine Learning Engineer)at Quantiphi)