Multi-Time Series and Hierarchical Time Series Forecasting Made Easy Using BigQueryML

Gilberto Gutierrez
Google Cloud - Community
5 min readMay 15, 2024

Not long ago I wrote a blog post that delves into the time series models supported in BigQuery and how to select the best one for your use case. In this new article, I’ll walk you through 2 different approaches that are useful when you want to forecast multiple time series using a single query alongside guidance with examples on how to choose the most convenient one depending on your needs.

1. Multi-Time Series Forecasting

What it is: The process of simultaneously generating forecasts for multiple time series (sets of data points ordered in time). Instead of building individual models for each series, BQ lets you create a set of time series models to forecast multiple time series using a single query. Additionally, all time series models are fitted simultaneously.

Why it’s useful:

  • Efficiency: Saves time and resources by consolidating modeling efforts.
  • Flexibility: Offers univariate time series forecasting (ARIMA_PLUS) with planned expansion to include multivariate capabilities (ARIMA_PLUS_XREG).
  • Scalability: Ideal for scenarios with a large number of time series. You can forecast up to 100,000,000 time series simultaneously by using the time_series_id_col option.
Large-scale Time Series Architecture in BigQuery

Example:

Imagine we wanted to forecast the number of impressions for a specific website by deviceSegment, accountType and account. Instead of building separate models for each time series, we can use BigQuery ML’s ARIMA_PLUS to train a single model on all of them:

CREATE OR REPLACE MODEL
`mydataset.mymodel` OPTIONS (model_type = 'ARIMA_PLUS',
time_series_timestamp_col = 'date',
time_series_data_col = 'total_impressions',
time_series_id_col = ['deviceSegment',
'accountType',
'account'],
auto_arima = TRUE,
clean_spikes_and_dips = FALSE,
adjust_step_changes = FALSE,
data_frequency = 'AUTO_FREQUENCY',
auto_arima_max_order = 2,
decompose_time_series = TRUE) AS
SELECT
deviceSegment,
accountType,
account,
date AS date,
SUM(impr_to_forecast) AS total_impressions
FROM
`myproject.mydataset.mytable`
WHERE
date BETWEEN DATE('2023-06-07')
AND DATE('2024-02-29')
GROUP BY
deviceSegment,
accountType,
account,
date;

Since we created a multi-time series model, we can examine the evaluation results for each time series or by all the time series:

Evaluation Results For Each Time Series
Evaluation Results For All Time Series

The result is a comprehensive set of forecasts, one for each time series, generated with a single query:

Multi-Time Series Forecasting Output

Multi-time series forecasting offers efficiency and scalability. However, to take your forecasting to the next level and unlock insights from hierarchical structures in your data, you can leverage hierarchical forecasting with automatic reconciliation.

2. Hierarchical Forecasting

What it is: A forecasting approach where time series are organized into a hierarchical structure (like a tree). The hierarchy can reflect relationships based on product categories, geographical regions, or other relevant groupings. Forecasts are generated at different levels of the hierarchy, allowing for a balance between granular and aggregate predictions.

Why it’s useful:

  • Reconciliation: Ensures that forecasts at different levels of the hierarchy are consistent (e.g., the sum of product-level forecasts should match the category-level forecast).
  • Flexibility: BQ incorporates the bottom-up reconciliation approach in univariate time series forecasting (ARIMA_PLUS). Multivariate forecasting (ARIMA_PLUS_XREG) is coming soon.
  • Improved accuracy: Often leads to more accurate forecasts than treating each series independently.

Example:

BigQuery ML’s ARIMA_PLUS allows us to build on our previous exercise and create hierarchical forecasts. By treating deviceSegment, accountType, and account as a hierarchical structure, we can generate forecasts at each level and seamlessly aggregate them:

CREATE OR REPLACE MODEL
`mydataset.mymodel` OPTIONS (model_type = 'ARIMA_PLUS',
time_series_timestamp_col = 'date',
time_series_data_col = 'total_impressions',
time_series_id_col = ['deviceSegment',
'accountType',
'account'],
hierarchical_time_series_cols = [ 'accountType',
'deviceSegment'],
auto_arima = TRUE,
clean_spikes_and_dips = FALSE,
adjust_step_changes = FALSE,
data_frequency = 'AUTO_FREQUENCY',
auto_arima_max_order = 2,
decompose_time_series = TRUE) AS
SELECT
deviceSegment,
accountType,
account,
date AS date,
SUM(impr_to_forecast) AS total_impressions
FROM
`myproject.mydataset.mytable`
WHERE
date BETWEEN DATE('2023-06-07')
AND DATE('2024-02-29')
GROUP BY
deviceSegment,
accountType,
account,
date;

The hierarchical_time_series_cols parameter indicates that we are creating a hierarchical forecast based on a set of columns. Each of these columns are rolled up and aggregated. From our previous query, deviceSegment is rolled up to show forecasts for each accountType and account. Separately, both accountType and deviceSegment are also rolled up to show forecasts for each account.The column order is important because it defines the structure of the hierarchy (columns specified in the time_series_id_col option), in our case, account -> accountType -> deviceSegment.

SELECT
*
FROM
ML.FORECAST( MODEL `mydataset.mymodel`,
STRUCT(7 AS horizon,
0.8 AS confidence_level))
ORDER BY
account,
accountType,
deviceSegment,
forecast_timestamp;

Notice how the aggregated forecast is displayed at an account level first (deviceSegment=NULL, accountType=NULL, account=AAP Funded Shopping) followed by accountType level (deviceSegment=NULL, accountType=Bing Ads, account=AAP Funded Shopping) and finally by deviceSegment level (deviceSegment=Desktop, accountType=Bing Ads, account=AAP Funded Shopping):

Hierarchical Forecasting Output in BigQuery

The bottom-up reconciliation will allow us to have all the levels correctly rolled up to their parent levels (deviceSegment rolls up to the accountType level and accountType rolls up to account level)

Hierarchical Forecasting Output in Google Sheets

Which One to Choose

  • Multi-Time Series Forecasting: Ideal when you have a large number of time series and want an efficient and scalable solution.
  • Hierarchical Forecasting: Suitable when your time series have a natural hierarchical structure, and you need to ensure consistency across different levels of the hierarchy.

In Conclusion

BigQuery ML empowers you to efficiently forecast numerous time series, whether they are independent or hierarchically structured. By leveraging the tools and techniques discussed in this post, you can unlock valuable insights from your data, enhance your forecasting accuracy, and ultimately make more informed decisions for your business.

Further Readings

[1] Google, Multiple time-series forecasting with a single query for NYC Citi Bike trips

[2] Google, Scalable forecasting with millions of time-series in BigQuery

[3] Google, Hierarchical time series forecasting

[4] Rob J Hyndman and George Athanasopoulos, Hierarchical time series

--

--