Hierarchical Time Series 101

by Chinmay Palande and Javier Recasens

Published in

The Opex Analytics Blog

9 min readOct 11, 2019

Imagine that you’re the new VP of supply chain operations at Scoopex Ice Cream Company (congratulations!). Your main responsibility is to ensure that the Scoopex supply chain operates smoothly and efficiently.

To get familiar with your new company, you start your first week by reviewing several reports with forecasts from different departments. The procurement department’s report states that they plan to buy enough raw material to manufacture a million gallons of ice cream next year. However, you notice that the store operations report estimates a demand of only half a million gallons. That’s not good!

There are three distinct possibilities at play here.

First, the procurement department’s forecasts could be off. The ingredients for a million gallons would create excess inventory, locking up capital that could be used in other areas of the business.
Second, the sales department’s forecasts could be inaccurate. In this case, the distribution network will suffer from inefficiencies, and stores might not be staffed appropriately to handle the customer load that a million gallons would necessitate.
Third, both the forecasts could be wrong. Forecasting is akin to throwing darts, but the two departments should at least throw the same dart.

Bottom line: separate forecasts for the same thing, whether it be ice cream or sales estimates, need to add up to ensure consistency in planning, budgeting, and execution. The solution? Hierarchical time series (HTS) forecasting, which ensures that forecasts at all different levels and parts of the business match up.

Let’s start with some common definitions

Before we get into the nitty-gritty of HTS forecasting, let’s break down some important terminology.

A time series is an ordered set of values of a quantity obtained at successive times, often with equal intervals between them. For example, the number of gallons of ice cream sold by a single store each month (or week, or day) is a time series. Stock prices, daily temperature measurements, and weekly box office figures are all time series.

Forecasting is the process of making predictions of the future based on past and present data. For example, estimating the number of gallons of ice cream sold by a single ice cream store per month for the next twelve months definitely counts as forecasting.

A hierarchical time series (HTS) is a collection of time series that follows a hierarchical aggregation structure. As an example, assume that there are three stores (Buckhead, Midtown, and Downtown) that sell Scoopex ice cream in Atlanta. A given store’s monthly ice cream sales is, itself, a time series. The total gallons of ice cream sold across all three stores is also a time series. Crucially, this collection of four time series has a hierarchical aggregation structure (specifically, a two-level geographical hierarchy).

Other common hierarchies include product hierarchies (e.g., SKU sales aggregate up to product subcategory sales, which further aggregate to product categories, and so on), temporal hierarchies (e.g., forecasts for the next seven days must add up to the week’s individual forecast), and more.

Hierarchical time series forecasting is the process of generating coherent forecasts (or reconciling incoherent forecasts), allowing individual time series to be forecast individually, but preserving the relationships within the hierarchy.

In our Scoopex example, the reports show that Scoopex has inadvertently created incoherent forecasts. However, if we ensure that each department’s sales estimates are ultimately in agreement, the forecasts will become coherent, and consistency will be restored.

HTS forecasting is not a time series forecasting methodology per se, like exponential smoothing or ARIMA. It’s actually a collection of techniques that make forecasts coherent across a time series hierarchy.

When can/should I use HTS forecasting?

You can use HTS forecasting whenever a time series hierarchy is involved. Such cases are nearly omnipresent in industry. Our Scoopex example is a good one for any product company — oftentimes you’ll want to match up forecasts for individual stores with the estimates for the greater region, or square up individual product forecasts with projections for the whole product category.

As another example, in the freight rail industry, crew employees might be permanently assigned to an area of coverage, but could rely on demand forecasts to decide which of the coverage area’s individual terminals to be at on a given day. This resource match could be aided by hierarchical time series forecasting.

These methods could also be used to align short- and long-term forecasts for consistency of financial planning and budgeting. For example, in retail, you may need a weekly sales forecast for the next month to make tactical inventory decisions, but also a monthly sales forecast for the next year for long-term procurement planning.

In practice, the best choice is oftentimes a combination of hierarchies, as forecast accuracy tends to improve if the model can learn from multiple relationships. For example, weekly sales for a SKU at a store can roll up into both product and geographical hierarchies. This is called a grouped time series (GTS), which is an extension of a hierarchical time series. (We won’t discuss grouped time series in this post, but read more about it here.)

There are four common HTS forecasting methods

1) The bottom-up approach

As the name suggests, bottom-up forecasting involves forecasting the most granular level of the hierarchy, then aggregating up to create estimates for the higher levels.

Applying bottom-up forecasting to our ice cream example requires forecasting the individual store sales for the Buckhead, Midtown, and Downtown locations first. Total Atlanta ice cream sales can then be calculated by simply adding up these store-level forecasts. Rinse and repeat for the Chicagoland locations, and then national sales will follow.

The main advantage of this method is that, because forecasts are obtained at the lowest level of the hierarchy, no information is lost due to aggregation. However, it ignores the relationships between the series (e.g., doesn’t take any Atlanta-area forecasts into account when forecasting Chicagoland locations, and vice versa), and usually performs poorly on highly aggregated data. It’s also computationally intensive, since you have to forecast the most granular time series in the hierarchy, meaning more data points (and therefore more horsepower and/or runtime is required). Furthermore, information at lower levels of a hierarchy tends to be noisier, potentially resulting in a reduced overall forecast accuracy.

2) The top-down approach

In the top-down approach, you first forecast the highest level of the hierarchy, then split up the forecasts to get estimates for the lower levels (typically using historical proportions).

For example, assume the forecast for next month’s nationwide ice cream sales is 30,000 gallons. Let’s also say that the historical proportion of monthly ice cream sales is about ⅓ for Atlanta and ⅔ for Chicago. The top-down approach tells us that Atlanta is projected to sell about 10,000 gallons of ice cream in the next month. Rinse and repeat for the lower levels of the hierarchy.

Due to its simplicity, this is one of the most commonly used methods for HTS forecasting. It provides reliable forecasts for higher levels in the hierarchy, and only a single true forecasting model is required.

However, the top-down approach tends to produce less accurate forecasts at lower levels of the hierarchy due to a loss of information (historical proportions don’t always fully capture the true behavior of the lower levels), especially for individual series with difficult distributions.

3) The middle-out approach

The middle-out approach is a combination of the bottom-up and top-down approaches, and can be used on hierarchies with at least three levels.

In this approach, you start by forecasting the middle level (neither the most granular nor the most aggregated). After these numbers are calculated, you can forecast the higher levels in the hierarchy using the bottom-up approach and the lower levels with the top-down approach.

For example, in the hierarchy shown below, forecasts will be generated for Atlanta and Chicago. The national sales forecast can be calculated by adding up the individual forecasts for both, and the location-level forecasts are estimated using historical proportions.

The middle-out approach is a healthy compromise between the bottom-up and top-down approach. Resulting forecasts don’t lose too much information, yet computational time does not explode.

4) The optimal combination/reconciliation approach

All the approaches mentioned above only really forecast one level of the hierarchy, but the optimal combination approach (proposed by Hyndman et al. in 2011) forecasts independently at all levels, using all the information and relationships a hierarchy can offer.

Assuming the base forecasts approximately satisfy the hierarchical aggregation structure (i.e., they’re not off by a half a million gallons of ice cream), the individual forecasts are then reconciled using a linear regression model. The newly coherent forecasts are a weighted sum of the forecasts from all levels, with the weights found by solving a system of equations that ensure the natural relationships between the different levels of the hierarchy are satisfied.

The optimal reconciliation approach can give more accurate forecasts than the other methods we’ve covered so far, providing unbiased forecasts at all levels with minimal loss of information, taking advantage of the relationships between time series to find patterns (e.g., seasonal variations at higher levels can better inform forecasts at lower levels of hierarchy). In addition, each forecast is created independently, meaning different forecasting methods (e.g., ARIMA, ETS, naive methods, etc.) can be used at each level. Though it’s potentially the most accurate option, this method is also the most complex and computationally intensive approach, which means that it doesn’t scale well for a large number of time series. (For the mathematical details of this approach, read more here.)

How do I decide which approach to use?

The best approach typically depends on your goals and constraints. Our general advice would be to start with simpler approaches, then move to more complex and computationally intensive methods if you’re unsatisfied with your results. In short, if a faster and simpler approach provides a forecast that suits your needs, you can start with a top-down approach and skip the more complex stuff. To figure out what’s good enough for you, figure out if you need certain degrees of accuracy at certain levels of the hierarchy, if you’re limited by the available computing or time resources, or if you’re constrained in any other way (e.g., your stakeholders must be able to interpret the model instead of just using the results).

The bottom-up, top-down, and middle-out approaches are typically biased toward the level they’re forecasting. If getting an accurate forecast for that level is the main goal, then using one of these three might be best. If not, the optimal reconciliation method provides unbiased forecasts that typically work at all levels of the hierarchy, but may be computationally restrictive when working with too many time series. If there are no constraints on computational time (we can all dream), the best strategy is to evaluate each method using back-testing, also known as time series cross-validation.

Conclusion

Hierarchical time series forecasting is a collection of techniques that makes forecasts coherent across a time series hierarchy. It is most valuable when different parts of the business use interconnected forecasts that need to add up. It ensures that a hierarchy’s forecasts are coherent, resulting in better planning, budgeting, and execution across the business.

Let us know what you think in the comments section below!

Note: A lot of the material in this post is inspired by the work of Rob J. Hyndman, who wrote a really practical (and totally free) book on time series forecasting, and is also the author of the R statistical package forecast.

_________________________________________________________________

If you liked this blog post, check out more of our work, follow us on social media (Twitter, LinkedIn, and Facebook), or join us for our free monthly Academy webinars.