Default Prediction with Machine Learning

We are proud to announce that our paper Grabit: Gradient tree-boosted Tobit models for default prediction has been recently published in the prestigious Journal of Banking & Finance. This blog post gives a brief introduction to using machine learning for default prediction and summarizes the results and contributions of our paper.

Predicting company defaults

A common problem in default prediction is class imbalance between defaults and non-defaults; i.e. typically only a small fraction of observations in a given dataset are defaults. This poses a problem for machine learning models, as the number of observations of defaults might be too small to identify patterns in the data and use them to accurately predict future defaults.

Using auxiliary data for default prediction

To combine the binary default data with continuous auxiliary data, we build on top of the Tobit model, a commonly used censored regression model. From an economic point of view, the Tobit model learns the default potential of a company, as represented for example by the numbers of delay days. In the model, a default occurs if the default potential exceeds a so-called default threshold. Note that additional information such as delay days are not observed for default cases, so the exact value of the default potential is only known below the default threshold (see Figure 1).

Figure 1: (Left) binary default classification; (right) linear regression on delay days for non-default cases

Importantly, the Tobit loss is asymmetric for defaulted observations, such that the loss decreases for predictions even past the default threshold value. Intuitively, this leads to the desired behavior of predicting a higher default potential for default cases, instead of penalizing predictions above the threshold. Yet, the Tobit model only considers linear dependencies on the features and cannot learn complex nonlinear relations in the data.

The Grabit model: applying gradient tree boosting to the Tobit model

Figure 2: Results from a simulation study on the impact of the class imbalance ratio on the performance of the Grabit model and other approaches.
Figure 3: Performance comparison of different models on Advanon’s loan dataset.

Interestingly, if the auxiliary variable is independent of the decision function, i.e., the auxiliary data contains no additional information, the Grabit model still performs as well as the best competing binary classifier in our simulation study. Further, we observe that the Grabit model outperforms other models also in cases of larger datasets if the decision function is sufficiently complex, for example, having strong nonlinearities or interactions among predictors.

At Advanon, an efficient and accurate default prediction is at the core of our business, and we are committed to further research in this area. With our research efforts, we hope to increase the efficiency and fairness of the loan market for the benefit of both SMEs and investors. Other fields for the application of our algorithm are yet to be explored. Potentially, the Grabit model could even have an influence on making rather unpredictable areas of everyday life like rainfall more predictable — or rather the predictions more reliable (Sanso and Guenni, 1999; Sigrist et al., 2012).

Predicting defaults — and rainfall

This research project was done jointly by Prof. Fabio Sigrist from the Lucerne University of Applied Sciences and Arts and Christoph Hirnschall from Advanon as part of an Innosuisse project under grant number “25746.1 PFES-ES”.

All About Advanon

This is the blog of Swiss online platform We aim to make businesses thrive. Offering pioneering financing products, credit intelligence and expert advice — success is our drive.

Christoph Hirnschall

Written by

Head of R&D at Advanon

All About Advanon

This is the blog of Swiss online platform We aim to make businesses thrive. Offering pioneering financing products, credit intelligence and expert advice — success is our drive.