Transparent ML for Enterprise Decisions — Score Cards
Note: this is Part IV in a series of articles on transparent machine learning models, click here for Part I — Introduction, Part II — Linear Models and Part III — Rule Sets.
Introduction
Scorecards have long been used as a valuable tool for making predictions in a variety of industries, in particular healthcare and finance. In healthcare, scorecards are used to diagnose patients and predict the likelihood that they have a particular medical condition, alongside other diagnostic tools such as lab tests or imaging studies. In finance, scorecards are often used to evaluate the risk of a mortgage applicant by predicting the likelihood that the applicant will default on their mortgage.
Here’s a very simplified example of what a risk assessment scorecard for lending might look like (higher score is better):
This scorecard considers the value of three variables — Income, DTI and Late Payments — and computes a subscore for each. The ranges — also known as bins — can be generalised as any subsets of the variables domain. As indicated in bold, an example customer (with income of 30,000, DTI of 16% and 2 late payments) would score a total of 30 points.
Visually, each subscore can be illustrated as a x-y chart like this, showing the relationship with input variable (X axis) and the subscore (Y axis). As is visually clear, the relationship can be non-linear and more or less elaborate, only bounded by the number of bins used.
In this article, we will delve deeper into the use of scorecards for predictions in healthcare and finance. We will explore the benefits and limitations of these tools and consider the potential for future developments in scorecard technology.
Score Cards in Banking
Scorecards are an important tool for banks and other financial institutions to assess risk and make informed decisions about lending and other financial transactions. They can be based on historical data and used to predict the likelihood that a borrower will default on a loan or other financial obligation.
To create a scorecard, banks typically collect a range of data about the borrower, including personal information, financial history, and and internal or external credit scores. This data is then used to evaluate the borrower’s risk profile and predict the likelihood that they will default on their loan.
There are a variety of data fields that banks might consider when creating a scorecard, including:
- Credit score: A credit score is a numerical representation of a borrower’s creditworthiness, based on their credit history. A high credit score indicates a lower risk of default, while a low credit score indicates a higher risk.
- Employment history: Banks may consider a borrower’s employment history when evaluating their risk profile. A borrower with a stable job and a consistent work history may be considered a lower risk than a borrower with a history of job instability or frequent job changes.
- Income: Banks will typically consider a borrower’s income when evaluating their risk profile. Borrowers with higher incomes may be considered a lower risk than borrowers with lower incomes, as they are more likely to be able to afford their loan payments.
- Debt-to-income ratio: This ratio compares a borrower’s total debt to their total income. A high debt-to-income ratio indicates a higher risk of default, as the borrower may have difficulty paying their debts.
- Payment history: Banks may consider a borrower’s payment history when evaluating their risk profile. A borrower with a history of making on-time payments may be considered a lower risk than a borrower with a history of late or missed payments.
Overall, banks use scorecards to assess the risk of a borrower and make informed decisions about lending and other financial transactions. By considering a range of data fields, banks can more accurately predict the likelihood of default and manage their risk.
Score Cards in Medical Diagnosis
In addition to their use in finance, scorecards are also used in the healthcare industry, for example in medical diagnosis. These tools are based on historical data and are used to predict the likelihood that a patient has a particular medical condition.
To create a scorecard for medical diagnosis, healthcare professionals typically collect a range of data about the patient, including personal information, medical history, and symptoms. This data is used to evaluate the patient’s risk profile and predict the likelihood that they have a particular medical condition.
There are a variety of data fields that healthcare professionals might consider when creating a scorecard for medical diagnosis, including:
- Age: Age can be a factor in the likelihood of certain medical conditions. For example, older patients may be more at risk for certain conditions, such as heart disease or cancer, while younger patients may be more at risk for other conditions, such as infectious diseases.
- Sex: Some medical conditions are more common in one sex than the other. For example: stroke and diabetes are more common in men, while osteoporosis and Alzheimers are more common in women.
- Medical history: A patient’s medical history can provide important clues about their risk for certain conditions. For example, a patient with a history of heart disease may be at higher risk for a heart attack, while a patient with a history of allergies may be at higher risk for asthma.
- Symptoms: The symptoms a patient is experiencing can provide important clues about their medical condition. For example, a patient with chest pain and shortness of breath may be at higher risk for a heart attack, while a patient with a fever and cough may be at higher risk for an infectious disease.
Overall, scorecards are an important tool for healthcare professionals to make informed decisions about medical diagnosis. By considering a range of data fields, healthcare professionals can more accurately predict the likelihood of certain medical conditions and guide treatment decisions.
Score Cards in other industries
Scorecards are a valuable tool for making predictions in a variety of industries beyond finance and healthcare. Basically any predictive task for which we want to generate a number or score (a.k.a “regression” models in Machine Learning) can be made transparently in the form of a scorecard. Here are a few examples:
- Marketing: Scorecards can be used by marketing professionals to predict the likelihood that a customer will respond to a particular marketing campaign. This can be based on data such as the customer’s previous purchasing history, demographics, and other relevant factors.
- Fraud detection: Scorecards can be used to predict the likelihood that a particular transaction is fraudulent. This can be based on data such as the transaction amount, location, and other relevant factors.
- Risk assessment: Scorecards can be used by insurance companies to predict the likelihood of a particular risk, such as a natural disaster or accident. This can be based on data such as the location, type of property, and other relevant factors.
- Employment: Scorecards can be used by employers to predict the likelihood that a job applicant will be successful in a particular role. This can be based on data such as the applicant’s education, work experience, and other relevant factors.
Overall, scorecards are a versatile tool that can be used for predictions in a variety of industries and contexts. By considering relevant data and applying statistical analysis, scorecards can help organizations make informed decisions and manage risk.
The strong points of scorecards
Scorecards have some nice properties that combines predictive power with interpretability:
- Additive: the contribution of each variable is independent of the others, meaning that each subscore is computed independently, making it easy to understand how the scoring works.
- Non-linearities: the subscore for each bin can effectively capture any non-linear effect for a variable. For example, at lower incomes the risk of a mortgage applicant reduces quickly, whereas at higher income levels the risk reduction tapers off.
The weaknesses of scorecards
The additive structure of scorecards also means they cannot provide high accuracy for all predictive tasks at good transparency. For example:
- Interacting variables: Just like for linear models, scorecards cannot capture interactions between variables without specific feature enginereering or the addition of e.g. rules.
- Size: As we push for increased accuracy, scorecards will grow in size, i.e. there will need to be more and more bins to reach higher levels of performance. Without suitable visualization, search and filtering capabilities, more than 10–20 bins for a variable might reduce our ability to understand the relationship between the variable and the subscore.
Summary
In the pursuit of transparent Machine Learning, scorecards provide a valuable tool for making predictions in a variety of industries, including healthcare and finance, as well as marketing, fraud detection, risk assessment, and employment. They are transparent and easy to understand due to their additive property and table-like structure.
However, scorecards also have some limitations, such as the inability to capture interactions between variables without specific feature engineering or rules. Furthermore, as the size of the scorecard increases, it can become harder to understand the relationship between the variable and the subscore, which can in practice reduce transparency.
All in all, scorecards are a great complement to linear models and rule sets, in particular for regression tasks such as scoring.
For the other articles in this series, see:
- Part I — Introduction
- Part II — Linear Models
- Part III — Rule Sets
- Part IV — Score Cards (this article)
For more information on how ScoreCards (and Rule Sets) are supported in IBM Business Automation, see this other article on Transparent ML in Business Automation.
Greger works for IBM and is based in France. The above article is personal and does not necessarily represent IBM’s positions, strategies or opinions.