Hierarchical Time Series Forecasting at ING

Published in

ING Blog

11 min readJan 29, 2020

In this article, I’ll be showing how hierarchical financial time series are forecasted within a soon-to-be open-sourced R package!

Introduction

Banks have been around since time immemorial[1], and collecting and analyzing data has been one of their core activities. After all, without data how can one bank decide how much money to lend to which project? Or how much money it should keep in as reserve for hard times? The data that a bank has roughly falls into two categories: financial (e.g. profit) and non-financial (e.g. customer interaction with online banking). This post is about financial data.

A bank’s financial data is usually time series data — for instance, how much lending volume has been recorded every month. Assessing the bank’s health and preparing it for the future can require forecasting these time series. After all, which bank is not interested in knowing the best prediction for their lending volumes? This can be quite straightforward, since time series forecasting is a well-established discipline and plenty of high-quality R packages are available.

Technicalities and academic curiosities aside, a bank stands to gain a lot from having healthy forecasts at hand. For instance, if deposits are forecasted to increase in the coming months then the bank can prepare itself to better make use of future available capital for lending. This lets it better optimize the amount of capital it needs to put aside and thus provide improved services to its customers.

Forecasting time series data is straightforward and the benefits are evident. However, what happens when we need to forecast time series when data is in aggregated format? What if, in our example, we have forecasted total lending volume for all of ING, but want to see how this forecast is affected by lending volume in total lending in the Netherlands vs all of the other ING countries?

This kind of data in aggregated format is referred to as hierarchical data, since there are lower levels of data (e.g. individual countries’ total lending) that add up to give the higher levels (e.g. total lending of all countries). We have decades of research and algorithms at our disposal to forecast for each individual level. But how can we combine these forecasts so that they reflect the hierarchical nature of the data? In other words:

How can we forecast hierarchical time series data?

This is the question that made the members of Finance Analytics roll up their sleeves and get to work. Finance Analytics is a team within Group Finance in ING that focuses on delivering financial data science products. These data science products usually consist of interactive dashboards that allow the end-user to interact with time series forecasts. This post will go through one of the important models they developed to tackle an important business question. At the time of this work being done, the team consisted of Dr. Mehmet Kutluay and Gertjan van den Bos as full-time data scientists and Olle Dahlen as an IT trainee. They all equally contributed to the work described below.

This post will first describe what the solution is to such an analytical problem within the context of this example. It will then discuss the implementation of this algorithm within an internal R package.

Analytical Solution

Before describing the analytical solution itself, which is from publicly available academic work[2], it is best to get a feel for what hierarchical data might look like. Here is a nice example:

The groups AA, AB, AC, BA and BB are “hierarchical leaves” — these are the groups that are at the lowest level of our hierarchy. The summation of the AA, AB and AC groups results in the A group, whereas the summation of the BA and BB groups results in the B group. The groups A and B are not leaves, since they are created from lower groups, but they can be referred to as the “children” of the group “Total”. Speaking of which, the A and B groups are summed to form this group, “Total”.

If you look at the figure above one more time, you might see that this means that we only need the hierarchical leaves to re-create the data in the entire hierarchy. This relationship can be summed up in a matrix of 1’s and 0’s, that we call “S”. The matrix S has dimension g x k, where g is the number of hierarchical groups (in our example, g = 8) and k is the number of leaves (in our example, k = 5). It shows the relationship between the leaves and all of the other groups:

Taking a step back, it is clear that we can forecast all 8 groups individually. We have decades of research and lots of high-quality statistical models at our disposal. The main issue is how do we combine these forecasts that reflect the hierarchical structure of the data?

If we just forecast all groups independently and put them in the equation above, it will not necessarily hold. This is because the forecasts for groups (like group A above) would be created independently of the hierarchical data that creates them (summation of groups AA, AB and AC). For groups that are leaves, this is not a problem. But data in the upper groups are, by definition, formed by the summation of their lower groups. This being said, if we only forecasted the leaves and added up all of those forecasts for the upper levels, then we would be incorporating information about the hierarchy into our forecast. In other words, we would have a hierarchically sound forecast.

This is called the Bottom-up approach. In equation form:

Let’s quickly unpack the symbols in this equation. S is the hierarchy matrix we saw a couple of paragraphs earlier. The y-hat on the right hand side (with K and h as subscripts) is a vector of forecasts from all the hierarchical leaves, denoted by K, for the forecasted period h. In our example it has dimension 5 x 1, since we have 5 hierarchical leaves. Recalling that our S matrix has dimension 8 x 5, the multiplication on the right-hand-side gives us the 8 x 1 vector, y-tilde (with h subscript), on the left hand side. This is the vector of our bottom-up forecasts for the forecasted period h, for all 8 hierarchical groups.

The Bottom-up approach is easy to execute. It is also easy to see how movements in the lower level forecasts influence the upper groups. For instance, if the forecast for group A is 100 million euros…

… then it is easy to trace which lower group (AA, AB or AC) contributed the most to this 100 million euros!

However, we have not made use of the data in the upper groups. Moreover, we are heavily dependent on the data quality of the lower groups. In hierarchical data, lower level can be more erroneous than higher level data. And lastly, when we add forecasts together we are also adding their standard errors as well. So even if we do get a point forecast for each upper level, they are accompanied by ever-increased standard errors.

It is clear that using all of the data makes sense. One potential solution is to run independent forecasts for all groups, but then reconcile them in an optimal way that is motivated by the least squares estimator.

This is called the Consistent approach. When we write down the reconciliation and do some mathematics, we end with this equation for getting hierarchical forecasts:

Let’s also quickly break down this formula. While S refers to our hierarchical matrix with 8 x 5 dimension, S’ refers to the transposition of S which has (in our example) 5 x 8 dimension. The multiplication of S’S while thus give a square matrix of dimension 5 x 5, which can be inverted[3]. This inversion is denoted by the “-1”. The y-hat (with h as subscript) vector, on the right hand side, consists of the stand-alone forecasts made for all of the 8 groups for forecast period h. The matrix multiplication on the left hand side then gives us the consistent forecast, shown as y-tilde on the left hand side, for all 8 groups.

The Consistent approach is also easy to execute and is statistically more efficient. It is more immune to potentially erroneous data at the lower levels, which means better forecasts at the upper levels. If you have information on forecast performance for all groups, then it is possible to include a weighting matrix in the equation above, and use the weighted least squares estimator to get forecasts.

These two approaches are useful for different business contexts. If the forecast accuracy of the lower levels is crucial, even at the expense of having accurate forecasts at the upper levels, then the Bottom-up approach makes sense.

If, however, the forecast accuracy of the upper levels is relatively more important and the lower level data is problematic, then the Consistent approach is more desirable.

Implementation and Discussion

Implementation

While it is straightforward to write the above equations into code for any project, we aimed to put in an already-existing internally used ING package. This is so that the code can easily be re-used for any project across analytics teams at ING.

This internal R package, called ingtsforecast, provides a pipeline for data scientists to initialize their data, run their forecasts and visualize them via a dashboard[4]. This is very similar to the fable R package that came out this year.

The Bottom-up and Consistent hierarchical forecasts were added into this pipeline. The functions are executed once forecasts have been run on all groups. On top of this, one is also able to explore the hierarchical structure visually in the resulting dashboard. Now, if someone from ING wants to see whether the forecast for overall customer deposits are mostly caused by trends in the Netherlands or Belgium can have a quantified answer.

Let’s look at this via a constructed example ourselves. A dataset is created of the 5 hierarchical leaves (AA, AB, AC, BA, BB) by taking random samples from a uniform distribution between the values of 10.000 and 60.000, across 60 months (January 2012 to December 2016). The groups A, B and Total are calculated by the summation of the leaves.

The data is then taken through the ingtsforecast pipeline for forecasting. 12-month ahead forecasts are estimated. The forecasts tested on their predictive performances[5], to see if the Bottom-up or Consistent hierarchical forecasts are better at prediction.

For this comparison, let’s focus on the Total group. The forecast performance can be seen below:

Note that a lower line means less forecast error, which means better forecast performance. There are hardly any differences between these two lines (orange is the bottom-up forecast, while gray is the consistent forecast). However, as the forecast horizon goes past 10 months, we see that the bottom-up forecast is doing much better.

Let’s see these forecasts, and how the lower groups of A and B add up to make them.

The consistent forecasts are more smoother than the bottom-up forecast. For different use cases this can be either an advantage or disadvantage. Deciding this is up to the end-user of the dashboard, who is always assumed to be an expert in the data being used. As such, we see that a data science product is indeed a tool — a means to an end, rather than an end in itself. The knowledge of the end-user is crucial in this tool being successfully used.

We can do many more things with this dashboard, however we will only look at the hierarchical forecasts here in order to keep focus!

Discussion

Aside from having a data science product, making an R package of this work helps make it shareable amongst the ING data science community. Another plus point is having to write unit tests for each function in the package. Since the package is open to development from other data scientists at ING, these unit tests make sure that the functions are resilient to future modifications. Thus, it is not only beneficial for the dashboard end-users, but also for other data scientists.

In short, we have addressed a business problem (working with hierarchical data) in an analytical manner. This was done via material found in academic literature and open-source analytics algorithms. The solution was then made available to all data scientists within ING by its incorporation into an already-existing R package, ensuring it’s functionality stays robust and enabling others to contribute and utilize these methods.

While this gives an exciting glimpse into how analytics can successfully work in ING, it should also be considered as a case study of how analytics in general can benefit business. Where business problems are convertible into analytics problems, data science solutions are possible. And these solutions can be implemented as long as data scientists are able to tap into the collective knowledge available via open source software and publications. The ingtsforecast package, for instance, will soon be available on ING’s GitHub repository. It is beneficial to everyone when analytics work is made as shareable as possible.

Then we can all see further by standing on the shoulders of giants.

References

Hyndman, Rob J., et al. “Optimal combination forecasts for hierarchical time series.” Computational Statistics & Data Analysis 55.9 (2011): 2579–2589.

[1] This is not quite true. The first bank, as we know them, came into existence in 1397 in the Republic of Florence. This was the Medici Bank, by the famous Medici’s. For this post, we will pretend that the late 14th century counts as “time immemorial”.

[2] The analytical solution is from Hyndman et al (2011). Most of the pictures used in this section are from the vignette of the hts R package, which is written by the authors of Hyndman et al (2011). If you are interested in what we have done here, then we highly recommend you read the paper and use the package for your own work!

[3] The more attentive reader might object to this, since inversion is only possible when the determinant of S’S is not zero. The equivalent condition is all the columns in the S matrix being linearly independent of each other. Since S has, by construct, linearly independent column vectors, S’S is invertible.

[4] For all you R fans out there — “dashboard” refers to an R Shiny dashboard

[5] This is done via an out-of-time test within the generated data