Credit Risk Modelling (Part I)

9 min readNov 24, 2022

A walkthrough of statistical credit risk modeling, probability of default prediction, and credit scorecard development with Python

Most people working in the financial services industry, or otherwise are exposed to credit risk either as internal or external stakeholders, would have most likely come across this one way or another.

Credit risk occurs when the borrower in a debt contract defaults or delays in repaying the debt either in whole or part. Thus, credit risk involves the unwillingness or inability of a borrower to meet financial obligations about lending, hedging, trading, or any other financial transactions. Interestingly, credit risk modeling estimates the probability that the client will fail to pay you back, (Santo Mero, 1997).

Whether you are an individual, a startup, a small business owner, or the treasury head of a large entity, chances are that you would have heard of these concepts while applying for credit, equity fundraising, or some sort of internal risk assessment.

Credit risk is an important topic in the field of finance because banks and other financial institutions heavily invest in reducing their credit risk. The main reason behind the global financial crisis in 2008 was that mortgage loans were given to customers with poor credit scores.

Expectedly, it is of utmost importance for lenders to protect themselves against default and limit the risk they are exposed to. Credit risk modeling is the way to empirically ascertain how much risk they will be exposed to if a loan request is granted to a particular borrower. Once this risk is determined, the lenders can then decide to give out the loan if the credit risk does not exceed the acceptable thresholds.

However, a default can occur when a borrower is unable to make timely payments, misses payments, or avoids or stops making payments, James Chen (2021). The probability that a client will fail to pay back loans is called their default risk. Hence, a loan is in default when the lending agency is reasonably certain that the loan will not be paid, this results in a loss for the agency.

Financial organizations are concerned about reducing the risk of default. As a result, commercial and investment banks, venture capital funds, asset management organizations, and insurance corporations, to mention a few, are increasingly depending on technology to anticipate which customers are most likely to default on their obligations.

Approving loans without proper scientific evaluation increases the risk of default. This can lead to the bankruptcy of lending agencies and consequently the destabilization of the banking system. This is what happened in the 2008 financial crisis which affected the world economy adversely.

The 5c’s of Credit

Credit risk can be measured by the “5 Cs of Credit”. It is a common phrase used to describe the five major judgmental factors used to determine a potential borrower’s creditworthiness. These 5 Cs of Credit incorporate both qualitative and quantitative financial measures, and the lender may analyze different documents, such as the borrower’s income statement, balance sheet, credit reports, and other documents that reveal the financial situation of the borrower, Thomas Brock (2021).

The 5 Cs of Credit refer to the Credit history [Character], Capacity to repay, Capital, the loan’s Conditions, and associated Collateral.

a) Character: is the most comprehensive aspect of the evaluation of creditworthiness. It is the propensity for a borrower to repay a loan on time. Past defaults imply negligence or irresponsibility, which are undesirable character traits.

b) Capacity: A borrower’s capacity to repay the loan is necessary for determining the lender’s risk exposure. One’s income amount, history of employment, and current job stability indicate the ability to repay outstanding debt. Other responsibilities, such as college-bound children or terminally ill family members, are also factored in to evaluate one’s future payment obligations.

c) Capital: represents the overall pool of assets under the name of the borrower. It represents one’s investments, savings, and assets such as land, jewelry, etc. Loans are primarily repaid using overall household income; capital is additional security in case of unforeseen circumstances or setbacks such as unemployment.

d) Conditions: refers to the specifics of any credit transaction, such as the principal amount or interest rate. Lenders assess risk based on how the borrower plans to use the money, should they receive it.

e) Collateral: When assessed for a secured product such as a car or a home loan, borrowers must pledge certain assets under their name as collateral. This may include fixed assets such as the title of a parcel of land or financial assets. These features are not individualistic as they cannot be influenced by the borrower. Nevertheless, they indicate the level of risk associated with a certain investment.

Expected Loss and Its Components

The expected loss of a firm is the amount that the firm loses as a result of a default on a loan. The expected loss a firm will incur is composed of 3 components:

a) Probability of default (PD): the probability that a borrower will fail to make full repayment of the loan over a specified period, usually one year. (Which is our main interest in this project)

b) Exposure at default (EAD): it is the outstanding amount of loss that a lender is exposed to at the given time of default.

c) Loss-given default (LGD): the amount of money a lender stands to lose when a borrower defaults on the debt obligations. It is usually expressed as the percentage of the ED.

The expected loss is simply the product of these three components. Which is given as;

Probability Of Default (PD)

The probability of default (PD) is an essential credit risk in the finance world. It provides an estimate of the possibility that a borrower will be unable to meet their debt obligations. In other words, it is the risk that the borrower will be unable or unwilling to repay its debt in full or part. The PD model must be easy to understand and interpret. Hence, it is modified into a simpler model called Score Cards.

Under Basel, I Accord (2004), a default event on a debt obligation is said to have occurred if;

i. It is unlikely that the obligor will be able to repay its debt to the bank without giving up any pledged collateral.

ii. The obligator is more than 90 days past due on a material credit obligation.

Scorecards and Credit Scores

A scorecard produces an individual creditworthiness assessment that directly corresponds to a specific probability of default. Because these creditworthiness assessments are named after the scorecards they are called Credit Scores.

A credit score is significant because it considers how often credit was used and how efficiently it was repaid. Credit scores are important for lenders because they reveal how capable an applicant is of taking on debt and repaying it in an efficient and timely way; this, in turn, reveals how risky it is for the lender to extend the applicant a loan or line of credit.

Companies that assemble and issue credit reports use different types of credit scores. The most common are FICO scores. Almost all lending institutions use the FICO scores to help determine how creditworthy an applicant is. FICO scores are also used to help determine the interest rate on any credit extended to an individual. FICO scores range from 300 (worst) to 850 (best).

Credit Risk Modelling?

A person’s credit risk is influenced by a variety of things. As a result, determining a borrower’s credit risk is a difficult undertaking. Credit risk modeling has entered the scene since there is so much money relying on our ability to appropriately predict a borrower’s credit risk.

Credit risk modeling is the practice of applying data models to determine two key factors. The first is the likelihood that the borrower will default on the loan. The second factor is the lender’s financial impact if the default occurs.

Credit risk models are used by financial organizations to assess the credit risk of potential borrowers.

For companies involved in the financial system, preserving the financial health of clients is critical. However, you could be wondering how to protect each client’s financial well-being. The answer to this challenge entails assessing each client’s payment likelihood based on a set of criteria and devising tactics to anticipate customer wants.
As a result, the goal of this research is to forecast the likelihood of default on a specific obligation, in this example, credit cards. This will enable the creation of solutions that reduce the risk of the client’s financial health deteriorating. Furthermore, it is proposed to employ clustering techniques to locate homogeneous portions within the population and thus provide differentiated treatment to each client to assist in the creation of collection tactics.

Modeling Probability Of Default

We are interested in the borrowers and respective loans if they have defaulted or not. Typically, the borrower is said to have defaulted if they are 90 days past a loan. A distinctive feature of the PD model is that all its independent / predictor variables must be categorical. The reason for this is that it is much easier to present the model in a simplified form and turn its features into a scorecard. The dependent variable should also be made binary; one class for a good loan and the other for a bad loan.

Methodology

Preprocessing the features is converting all variables to dummy variables. This is possible for only categorical variables/features. However, continuous variables have a different approach; we convert the continuous variables to dummy variables. This is done by converting them into categories of mini-sized intervals.

In reality, there’s a well-established methodology to turn continuous variables into categories. Once they are categorical, we proceed by building them up depending on their properties. These processes are called; Fine-Classing and Coarse Classing.

a) Fine-Classing: It’s a technique that groups a variable’s value into several fine (equal-sized) bins. Through fine-classing, both discrete/continuous variables can be represented with categories. We start by getting the ability of each category to predict the dependent variable. ie; to check how the rate of good and bad loans varies across the categories. Therefore, the established metric that is used is the Weight of Evidence (WOE).

Weight of Evidence (WOE): shows the extent to which each of the different categories within an independent variable explains the dependent variable. In other words, it tells how much evidence the independent variable has concerning differences in the dependent variable. It also shows the extent to which each of the different categories of the independent variable explains the dependent variable.

The further the WoE is from 0, the better the category in differentiating the two categories of the response variable.

b) Coarse Classing: this is a technique for applying a binning process to fine granular bins to merge those with similar risk and create fewer bins, usually up to 10. It’s the process of constructing new categories based on the initial ones.

Suppose there are k-categories of an independent variable, then we could calculate the WoE for each of the categories, then weigh the WoE for each category by;

This metric is called the Information Value (IV)

The Information Value (IV) shows how much information the original independent variable brings concerning explaining the dependent variable. It identifies the independent variables which explain the response variable best. That is, we can use the IV for the preselection of predictors. The IV ranges from 0 ≤ IV ≤ 1.

Conclusion

We have briefly discussed the introduction to credit risk modeling and some methodologies involved. In the next article, we will be getting more practical by building customers’ probability of the default model. Here’s the link

Stay tuned and happy learning!

Follow me:

Twitter

GitHub