Building Credit Rating Systems With Scarce Data

Published in

Brex Tech Blog

7 min readMay 9, 2022

Context

Data scientists at fintechs often face numerous modeling challenges when building a credit rating system. The most common data limitations are:

Lack of historical credit data: This occurs when designing new credit products, or expanding existing credit products to new customer segments.
Lack of bad credit events: Available credit data may contain very few default events observed over time because of low occurrence within the population, or the borrower may be offboarded before a default event occurs.
Irrelevant historical information due to a change in business practices: Data collected in the past becomes less useful when customers transition across different credit products or when revising methods of assessing credit risk.
Inconsistent definition and identification of default in historical credit data: These happen when complex engineering systems and credit policies evolve across different credit products.

This article presents a general modeling framework to overcome these data limitations and discusses various modeling approaches for building a credit rating system.

Overview

A credit rating system assesses the creditworthiness of borrowers and the quality of credit transactions between a lender and borrower. A credit rating is an ordinal variable used to group and rank borrowers by credit risk, typically using a discrete scale. One of the ways to statistically quantify credit risk is through a Probability of Default (PD) model.

Figure 1: A simplified example of Credit Rating System

Probability of Default (PD) Model

A PD model predicts the likelihood of a borrower defaulting within a certain period in the future and outputs a score between 0 to 1 — which inherently creates a rank order. In a PD model, credit risk experts need to clarify a definition of default (i.e., the target variable of the PD model) by understanding what specific state of a borrower qualifies as a default (or any bad event). PD model features or risk drivers used to predict default can include:

Financial ratio attributes calculated from financial statements of the borrower
Past credit behavior attributes calculated from credit bureau data
Cash balance position
Bank activity behavior attributes calculated from bank transactions
Product usage and credit behavioral attributes calculated from internally collected data
Any business health attributes derived from alternative data sources

Popular statistical and machine learning methods used to model the probability of default include logit and probit models, classification trees, and survival models.

Challenges

Developing and testing any quantitative model to estimate PD in a credit rating system requires a sufficient number of observed default events. If there are not enough of these, we end up in a situation called Low Default Portfolio (LDP). The challenge in building a quantitative model in such a situation can be attributed to 2 types of problems:

Cold start problem: Default events are totally absent in the data. As a result, there is not enough information to train a reliable model. Since the model cannot exploit the data about the default events, no meaningful inferences can be drawn from model predictions with confidence.
Imbalanced data problem: Data is exceptionally skewed to non-default events. As a result, poorly trained machine learning classifiers can become more biased towards non-default events and thus, erroneously classify default events. Classifiers may even predict all data as non-default events.

When the default events in the modeling dataset are rare or non-existent, most of the techniques proposed in the literature to handle both of these problems cannot be applied directly. It is possible we observe no defaults in the modeling dataset, but the true probability of default is always larger than zero. Even when we observe some defaults, there is a high chance of underestimating the true probability of default. In addition, we cannot immediately validate the predictions in practice as the credit card lifecycle data is slow to accumulate. In this case, the feedback loop is much longer because a new observation is recorded potentially only once a month, and default events are sometimes hard to obtain.

Our Framework

Figure 2: Low Default Portfolio Modeling Framework

We identify 3 main pillars for our modeling framework to work in a low default portfolio situation when building a credit rating system:

Model methodology selection
Model testing
Margin of conservatism

Model Methodology Selection

For an LDP with a cold-start problem, it is tough to build a PD model. In such situations, we may not quantitatively estimate PD. Still, we can generate the rank ordering for the credit system using either of these methods:

Readily available off-the-shelf risk models developed by credit bureaus and rating agencies
Heuristics as a PD proxy based on credit risk experts’ intuition

Table 1: Modeling methodologies when dealing with cold start problem

For an LDP with class-imbalance problems, the PD model can be built using:

Internal data: Credit transactions data accumulated internally
External data: Credit transactions data acquired externally by pooling from other financial institutions
A combination of internal and external data

Depending on the credit product’s maturity and data availability, one can estimate PD for generating rank order in the credit system using one of the methods below.

Table 2: Modeling methodologies when dealing with imbalanced data problem

Regardless of the methodology you choose when designing PD models for LDP, it is beneficial to:

Be prepared to combine different approaches: Complement available credit data with the expert judgment of credit risk professionals to cover all risk drivers.
Understand the target populations and the significant drivers of risk: Before picking a modeling approach, know your population and intended audience for model use.
Plan for future model development: Start collecting data for future work and have a vision for how the model will evolve in the future.
Collect opinions of a group of persons: To avoid systematic bias when using expert judgment, don’t gather input from only a single individual.
Review outliers at the end and identify an apparent reason for their existence: Address outliers resulting from model limitations and strange but legitimate business scenarios by defining override policies and additional guardrails rather than removing such outliers from the modeling dataset.
Devise features that make business sense: First, partner with expert credit risk professionals to brainstorm intuitive features that can discriminate risky from non-risky customers. Next, validate the intuition behind these features with data. Finally, confirm the validation findings with credit risk professionals to avoid selection bias before using these features.

Model Testing

Model testing requires a significant amount of default data to derive valid statements about the model performance. However, to overcome the challenges of LDP when building a credit rating system, PD models can be tested using a combination of externally pooled data, off-the-shelf risk models, and credit risk experts’ heuristics. Use any of the below methods to test the PD model developed for LDP.

Backtesting: The model is tested against the default events from external data to derive statistical conclusions about the model performance. The model evaluation metrics used to measure the discriminatory power of the model are - AUROC, Cumulative Access Profile (CAP) curve, K-S Statistics, etc.
Benchmarking: The model is tested by comparing the rank order derived from the model with that of a “challenger” model. A challenger model, in this case, can be an off-the-shelf risk model or heuristic developed by credit risk experts. The model evaluation metrics used to measure the effectiveness of rank-ordering generated by the model are Spearman’s rank correlation coefficient, Somer’s D or Kendall’s Tau, Lift/Gain charts, etc.

Margin of Conservatism

When dealing with LDP while building a credit rating system, the PD model may suffer from deficiencies, no matter which model methodology you choose. Such deficiencies can be due to the data representation used for modeling and the intrinsic model risk associated with a particular methodology. They can cause overestimation or underestimation of true PD for the population.

To mitigate the impact of these deficiencies, first identify the source of the model deficiencies, measure their impact, and apply a conservative margin to the PD estimate to correct for the inaccuracies in the estimates. Deciding on a conservative margin often requires input from credit risk experts. After the model is deployed in production, you’ll need to establish a model-monitoring mechanism to track the changes in population representation and any errors due to data or methodological deficiencies. The general expectation is that these errors should reduce over time or margins should be adjusted to reflect new business realities.

Final Thoughts

Low default portfolios are pretty standard in the financial industry. Data scarcity challenges can impede the development of a reliable PD estimation, leading to a substantial underestimation or overestimation of credit risk. The above general operating framework can equip data scientists to overcome data scarcity challenges and quickly build and scale credit rating systems to power underwriting and credit risk management.

Are you interested in solving exciting data science challenges that enable Brex to help every growing company realize its full potential? Join us!

We’d like to acknowledge reliable support and guidance from our accomplished data scientists on the Credit Science team, talented engineers from the Data Platforms and Underwriting teams, and business savvy individuals from the Credit Strategy team in enabling credit rating systems at Brex.