Credit Scoring — Scorecard Development Process

8 min readApr 6, 2018

Do you know your credit score? Have you been denied credit and do not know why? Everyone who has ever borrowed money to apply for a credit card or buy a car or a house or any other personal loan has a credit file. Lenders use credit scores to determine who qualifies for a loan, at what interest rate, and what credit limits. The higher the credit score the more confident a lender can be of the customer’s creditworthiness. However, a credit score is not part of a regular credit report. There is a mathematical formula that translates the data in the credit report into a three-digit number that lenders use to make credit decisions but the exact formula bureaus use to calculate credit scores is a secret.

The purpose of this project is to use credit scoring techniques that assess the risk in lending to a particular client and build a scorecard model. Credit scoring means applying a statistical model to assign a risk score to a credit application and it is a form of Artificial Intelligence, based on predictive modelling, that assesses the likelihood of a customer defaulting on a credit obligation, becoming delinquent or insolvent.

Over the years, a number of different modelling techniques for implementing credit scoring have evolved. Despite diversity, Credit Scorecard model stands out and is used by nearly 90% of scorecard developers. As a statistical/machine learning hybrid, its scores can be directly used as probability estimates and hence to provide direct input for risk-based pricing.

Next, details about how to use credit scoring to build a consumer credit scorecard will be listed as follows. The analysis will include exploratory data analysis, variable selection, model building, and scoring.

Exploratory data analysis

Credit scoring data set that is used in this project is from Kaggle. In the beginning of every modelling procedure, the first question to ask is what we are trying to predict by the model. In credit scoring , this is the predictor/dependent variable. It has a binary value of either 1 or 0. A value of 1 means that the borrower is delinquent and has defaulted on his loans for the last 2 years, while a value of 0 means that the borrower is a good customer and repays his debts on time for the last two years. Dependent variable here is ‘SeriousDiqin2yrs’ as shown in the second column in the table below.

Figure 1. Explore data — missing values and outliers

It is common that most financial industry data contains missing values, or values that do not make sense for a particular characteristic. As seen in the table, this dataset also has missing values and outliers. Since we deal with estimation of client’s creditworthiness using logistic regression model, we impute missing values using median and drop the illogical values.

For example, the ‘age’ variable is a continuous variable from 0 to 100. There were certain records, which had a value of zero, that did not make sense and to be qualified as a borrower, the person must be an adult of 18 years. Therefore, we consider these values as missing values and choose to drop these values. In addition, the ‘RevolvingUtilizationOfUnsecuredLines’ feature is a ratio of the total amount of non-secured debt to the total non-secured credit limit and this feature should have values between 0 and 1, but some of the records have values greater than 1. In this case, there are outliers in the ‘RevolvingUtilizationOfUnsecuredLines’ feature and we choose to pre-process outliers by using top-coding method which means that all values that are above the upper band will be arbitrarily set to the upper band.

Discretizing Predictors/Binning

Binning means the process of transforming a numeric characteristic into a categorical one as well as re-grouping and consolidating categorical characteristics. In the process of developing scorecard, why binning is required? Reason is that some characteristic values can rarely occur, and will lead to instability if not grouped together. Therefore, grouping of similar attributes with similar predictive strengths will increase scorecard accuracy. Example of grouping ‘age’ feature is shown below.

Figure 2. Example of grouping ‘age’ feature

Scorecard — Model Building

Before building scorecard model, two additional steps are needed. One is to calculate Weight of Evidence, the other step is to calculate Information Value (IV) based on WoE value,.

For verification of binning results we use WOE values. After splitting a continuous variable into few categories or to group a discrete variable into few categories for every features, we can calculate Weight of Evidence (WoE) value and then the categorical values are replaced by the WoE values which can be used later for building the model. Details about calculation of WoE is in the following section.

Weight of Evidence (WoE)

WoE measures the strength of an attribute of a characteristic in differentiating good and bad accounts and is based on the proportion of good applicants to bad applicants at each group level. Negative values indicate that a particular grouping is isolating a higher proportion of bad applicants than good applicants. It is a measure of the difference between the proportion of goods and bads in each attributes. For example, the odds of a person with that attribute being good or bad and negative WoE values are worse in the sense that applicants in that group present a greater credit risk. For each group i of a characteristic WOE is calculated as follows:

Figure 3. Example of WoE result of ‘age’ feature

After calculating WoE in each group of all characteristics, what needs to be confirmed is that an overall trend of are logical, and there is no data quirks. Logical relationships ensure that the final weightings after regression make sense and this also ensures that when attributes are allocated points to generate a scorecard, these points are logical.

Once we finish grouping the variables and calculating WoE, we do the rank that orders variable by Information Value (IV) to screen and select variables. Details about calculation of IV is in the following section.

Information Value (IV)

Information Value comes from information theory and is measured using the following formula. Information value is used to evaluate a characteristic overall predictive power.

IV is a convenient rule of thumb for variables selection.

Notice that the information value for NumberRealEstateLoansOrLines is 0.116 which is barely falling in the medium predictors’ range and it is unpredictive. Typically, variables with medium and strong predictive powers are selected for model development. Therefore, we do the feature selection and choose 8 features out of 9 features according to IV values as shown in the red highlight box below.

Scorecard Development
We deal with the modelling of the scoring function and estimation of client’s creditworthiness using logistic regression model. The regression coefficients are used to scale the scorecard. Scaling a scorecard refers to making the scorecard conform to a particular range of scores. Big picture of scorecard development is shown as follows.

Figure 6. Big picture of scorecard development

Score-points scaling

Scaling a scorecard refers to making the scorecard conform to a particular range of scores and the regression coefficients are used to scale the scorecard. Logistic regression models are linear models, in that the logit-transformed prediction probability is a linear function of the predictor variable values. Thus, a final scorecard model derived in this manner has the desirable quality that the final credit score (credit risk) is a linear function of the predictors, and with some additional transformations applied to the model parameter, a simple linear function of scores that can be associated with each predictor class value after coarse coding. So the final credit score is then a simple sum of individual score values that can be taken from the scorecard.

For each attribute, its Weight of Evidence (WoE) and the regression coefficient of its characteristic now could be multiplied to give the score points of the attribute. An applicant’s total score would then be proportional to the logarithm of the predicted bad/good odds of that applicant.

Figure 7. Logistic regression coefficients

Score-points scaling mechanism and calculator

We choose to scale the points such that a total score of 600 points corresponds to good/bad odds of 50 to 1 and an increase of the score of 20 points corresponds to a doubling of the good/bad odds.

Scaling — the choice of scaling does not affect the predictive strength of the scorecard
‘Points to double the odds’ (pdo = 20)
Factor = pdo / ln(2)
Offset = Score — {Factor * ln(Odds)}

Decision from Scorecard

Below is one calculation score result from using the score formula.

Generally speaking, the cut-off score will differ from one type of loan to another, as well as between lenders. Some loans require a minimum score of 620, while others may accept scores less than 620. Therefore, after getting cutoff score, we can then decide whether to approve the loan or not. Scorecard example below from online gives a better understanding of how it works.

Conclusion

Overall, the predictive model learns from by utilizing a customer’s historical data together with peer group data and other data to predict the probability of that customer displaying a defined behavior in future. They not only identify ‘good’ applications and ‘bad’ applications on an individual basis, but also they forecast the probability that an application with any given score will be ‘good’ or ‘bad’. These probabilities or scores, along with other business considerations, such as expected approval rates, profit, churn, and losses, are then used as a basis for decision making.

That is all about this machine learning project for now. If you have any questions or comments, feel free to contact me or leave comments below. If you have interests in data science, feel free to check this link (WeCloudData). Thank you so much for taking the time to read this blog.

Credit Scoring — Scorecard Development Process

Written by Sandy Liu