Fairness Explained: Definitions and Metrics

Carolyn Saplicki
IBM Data Science in Practice
11 min readNov 11, 2022

Co-authored by Carolyn Saplicki, Senior Data Scientist with Expert Labs Trustworthy AI Practice, and Mitali Bante, Senior Data Scientist with Expert Labs Trustworthy AI Practice.

As more companies turn to Artificial Intelligence (AI) models to generate data driven results, guardrails must be utilized to ensure trustworthiness across the model lifecycle. AI can only reach its full potential when trust can be established in each stage of its lifecycle. Trustworthy AI is lawful, ethical, and robust. In this article, fairness definitions and fairness metrics are explained through a real-world example.

INTRODUCTION

In AI, models use training data to detect trends and make predictions. In some cases, these predictions differentiate individuals, making AI a form of statistical discrimination. However, this discrimination becomes objectionable when it places certain groups at disadvantages. Fairness defintions identify historical systematic disadvantages. Fairness metrics quantify the presence of bias in our model.

Often, systematic bias results from underlying data. Bias can enter the AI lifecycle in numerous ways: through the data source (social bias), the sampling method (representation bias), the pre-processed data (preparation bias) and/or other channels. To understand the source of bias in your model, check out Kush Varshney’s chapter on Data Sources and Biases in Trustworthy Machine Learning.

To prevent the continuation of unfairness, regulations are utilized at the government, industry, and business level. For instance, many companies have gender equality regulations one of which being a hiring system that requires gender equality in the hiring process. However, it can be difficult to understand how to make these processes fair. Take our hiring example: suppose ten women apply and two men apply. Should we interview two women and two men? This would satisfy gender fairness based on equal numbers. Or should we interview five women and one man? This would satisfy gender fairness based on proportionality. What makes one process more or less fair than another?

Fairness definitions and fairness metrics vary based on the domain, use case and the expected end results. It can be difficult to ascertain which metric is optimal for each model. This blog will explain six metrics of fairness: disparate impact, statistical parity difference, equal opportunity difference, average odds difference, Theil index and consistency. Furthermore, each metric will highlight an interpretable example. Lastly, we will uncover information on how to set metric thresholds for the process of monitoring fairness. This helps in taking actions to mitigate unwanted and failure scenarios for the business.

To understand our fairness metrics, we will follow the same use case of predicting a criminal defendant’s likelihood of reoffending. Assume that you have a binary classifier that takes in a dataset that contains information about a defendant such as gender, race, marital status and number of juvenile cases.

FAIRNESS DEFINTIONS

In order to utilize these metrics, you should have a fundamental understanding of your business problem and know the following: protected attributes, privileged group, favorable label, and case type. These combined are the fairness definition for a specific model use case.

Note: These definitions are only being used as an example for our use case. In real world these would need to be set by the business that owns the model in accordance with legal and business standards.

Protected attribute(s): An attribute that partitions a population into groups whose outcomes should have parity. Examples include race, gender, caste, and religion. Protected attributes are not universal, but are application specific.

  • For our use case, we have chosen to look at two protected attributes: sex and race.

Privileged group: A value of a protected attribute indicating a group that has historically been at a systematic advantage. It can be difficult to ascertain which protected individuals belong to each group. Stakeholders should have a deep understanding of their domain to recognize privileged and unprivileged groups within protected categories. Statistical methods can be utilized to understand the division in protected attributes. For instance, continuous variables such as age can be split into buckets. Along with this, races can be combined to make different race categories such as Caucasian and Not Caucasian. Intersectionality may also be investigated to determine if the combination of subgroups is at risk of unfairness.

  • Sex, privileged: Female, unprivileged: Male
  • Note: In this use case, the privilege group is female; however, in other use cases females may be underprivileged, highlighting the importance of domain expertise.
  • Race, privileged: Caucasian, unprivileged: Not Caucasian
  • Note: Fairness metrics calculations are only performed for race in this article, but it can be replicated for other protected attribute as well (in this case sex).

Favorable label: A label whose value corresponds to an outcome that provides an advantage to the recipient. The opposite is an unfavorable label.

  • In our case, an individual with a Not Reoffend label (Y=0) will receive lower bail. An individual with a Reoffend label (Y=1) will receive higher bail. If a criminal defendant is predicted to not reoffend, they are given a favorable label of Not Reoffend (Y=0). If a criminal defendant is predicted to reoffend, they are given an unfavorable label of Reoffend (Y=1).

Case Type: Models can be either punitive or assistive in nature depending on how the predictions are used. If intervening in a situation may harm individuals, the situation is punitive. If failing to intervene in a situation may harm individuals, the situation is assistive.

  • Punitive: An individual with a Not Reoffend label (Y=0) will receive lower bail. An individual with a Reoffend label (Y=1) will receive higher bail.
  • Note: If our model would give individuals with Reoffend (Y=1) label free housing after jail time, the model would be assistive.

BASIC STATISTICS

To understand the fairness metrics, we will first define a confusion matrix. A confusion matrix is a summary of the models’ predictions as compared to the ground truth it was trained on. Here, the number of correct and incorrect predictions can be easily seen and compared, resulting in the explainable validity of the model.

Confusion Matrix

In our example, suppose the predictions from our model can be summarized on the confusion matrix below.

Use Case Confusion Matrix

To explain fairness, we will utilize the protected attribute of race and the favorable label of Not Reoffend in the criminal defendant model.

Note: Metrics need to be calculated for all protected attributes. As our goal is to explain fairness, we are only looking at these values with respect to race. To ensure trustworthiness in this model, we would need to evaluate the metrics for sex.

Confusion Matrix when partitioned based on Race:

Caucasian (privileged):

Confusion Matrix: Privileged

Non-Caucasian (unprivileged):

Confusion Matrix: Unprivileged

GROUP VS INDIVIDUAL FAIRNESS METRICS

Group fairness is the idea that the average classifier behavior should be the same across groups defined by protected attributes. Here, we compare members of the privileged group and members of the unprivileged group. The group fairness metrics we will look at here are: statistical parity difference, disparate impact, average odds difference and equal opportunity difference.

Individual fairness is the idea that all individuals with the same feature values should receive the same predicted label and that individuals with similar features should receive similar predicted labels. Individual fairness includes the special case of two individuals who are the same in every respect except for the value of one protected attribute (known as counterfactual fairness). Individual fairness metrics include Theil index and consistency.

FAIRNESS METRICS

Group Fairness Metrics

Disparate Impact
This metric is computed as the ratio of rate of favorable outcomes for the unprivileged group to that of the privileged group. The ideal value of this metric is 1.0. A value < 1 implies higher benefit for the privileged group and a value >1 implies a higher benefit for the unprivileged group. This is a demographic parity metric.

Statistical Parity Difference
This metric is computed as the difference of the rate of favorable outcomes received by the unprivileged group to the privileged group. The ideal value of this metric is 0. A value of < 0 implies higher benefit for the privileged group and a value > 0 implies higher benefit for the unprivileged group. This is a demographic parity metric.

Equal Opportunity Difference
This metric is computed as the difference of true positive rates between the unprivileged and the privileged groups. The true positive rate is the ratio of true positives to the total number of actual positives for a given group. The ideal value is 0. A value of < 0 implies higher benefit for the privileged group and a value > 0 implies higher benefit for the unprivileged group.

Average Odds Difference
This metric is computed as average difference of false positive rate (false positives / negatives) and true positive rate (true positives / positives) between unprivileged and privileged groups. The ideal value of this metric is 0. A value of < 0 implies higher benefit for the privileged group and a value > 0 implies higher benefit for the unprivileged group.

Individual Fairness Metrics

Below is a sample table of data and predictions:

Theil Index
This metric is computed as the generalized entropy of benefit for all individuals in the dataset, with alpha = 1. It measures the inequality in benefit allocation for individuals. A value of 0 implies perfect fairness. Fairness is indicated by lower scores; higher scores are problematic.

Consistency

This metric measures how similar the predictions are for similar instances. It is calculated as the difference between 1 and the average difference in predictions of k-nearest neighbors. This value ranges from 0 to 1, with 1 being the ideal value. You can select what attributes should be considered for the distance computation. For simplicity, let us consider k=2 and assume that the first 3 rows are very similar to each other (form one cluster) and the last 3 rows form one cluster. When implementing in reality, you need to decide this by calculating the distance between each pair of observations. Also, it depends on the use case if you want to consider the protected attribute or not while calculating these distances.

SO, WHICH FAIRNESS METRIC IS RIGHT FOR YOU?

Our examples highlight how the fairness metrics can be utilized; however, they are not all appropriate for this model. Stakeholders must understand their use case to optimize essential fairness defintions and fairness metrics. Often, legal requirements and ethical standards highlight which fairness definitions and fairness metrics are critical. When there is ambiguity upon which fairness metrics to use, resources such as the Fairness Compass can help identify which fairness metrics are necessary for the problem at hand.

Our use case needs disparate impact in order to monitor the rate of favorable outcome for non-Caucasians to that of Caucasians.

Note: For the purpose of this post, we are only monitoring race using disparate impact. Our use case may need other fairness metrics to monitor other fairness definitions.

METRIC THRESHOLDS

Now we have a good understanding of the correct fairness definitions and fairness metrics. We can decide which fairness definition and metric is right based on our business understanding and historical data. Let’s see what’s next?

After obtaining our fairness metric, we must compare this to the metric threshold. A metric threshold is a numeric value that allows you to set a definitive limit on how much bias is acceptable for your model, confirming whether or not unfairness is present in the data. Thresholds are dependent upon the fairness definition chosen and must be selected before calculating the fairness metric within your data. This is done so thresholds are not changed based on data results.

Thresholds differ based on each use case. These can be due to government, industry or business regulations. It is important to talk to stakeholders to create a fundamental business understanding of the task at hand. If a business is unsure of thresholds for their particular use case, there are organizations that help understand government regulations and best practices to set control frameworks for AI solutions.

Thresholds can either be uni-directional or bi-directional. Uni-directional thresholds are often used when focusing on the unprivileged group. The bidirectional threshold ensure that the model predictions stay within a range that do not create unfairness for either group. Bi-directional thresholds can even highlight faulty domain knowledge or a change in human behavior. To highlight the difference, we will expand upon disparate impact.

The thresholds we selected for our sample use case were based on the 80% rule.

  • Uni-directional: Metric threshold is at 0.8. Here, we are only concerned if our metric falls below 0.8. A value of the disparate impact ratio less than 0.8 is considered unfair and values greater than 0.8 are considered fair.
  • Bi-directional: Metric threshold is between 0.8 and 1.25. Here, we are concerned if our metric falls below 0.8 or above 1.25. The 80% rule can be symmetrized by considering disparate impact ratios between 0.8 and 1.25 to be fair.

Uni-Directional

If you remember from our calculation from the above example, we found our disparate impact to be 0.75. As 0.75 < 0.8, bias is present. This means that the rate of Not Reoffend predictions for non-Caucasians is only 75% of the rate of Not Reoffend for Caucasians. Actions need to be taken in order to mitigate this bias before the model is put into production.

Bi-Directional

If you remember from our calculation from the above example, we found our disparate impact to be 0.75. Here, our original example highlights unfairness for the unprivileged group (0.75 < 0.8).

However, suppose the rates were switched for the privileged and unprivileged groups:

Here, the unprivileged group are at a greater advantage than the privileged group. This upper threshold informs us that the privileged group may be at a disadvantage (1.334 > 1.25).

This example highlights bias against the privileged group. This may highlight a change in human behavior or faulty domain knowledge.

Ultimately, the best fairness threshold for our use case is bi-directional as we do not want either the privileged or unprivileged groups to be significantly different from one another.

CLOSE

As more businesses utilize AI, it is essential to implement fairness monitoring. AI models affect real people and have real consequences. Fairness monitoring alerts key stakeholders to violations in fairness standards preventing unethical AI and business conflicts. It is up to business stakeholders to ensure that their models are performing fairly for their given use cases.

Fairness in AI requires thought and conscious effort; fairness definitions and fairness metrics must be chosen purposefully for model monitoring. Only through monitoring AI can we circumvent future issues relating to unfairness. By identifying unfairness early, any bias found can be mitigated before it causes problems for the business. This unfairness can be combated with bias mitigation algorithms.

IBM offers Watson OpenScale to monitors the trustworthiness of models through foundational pillars. These pillars include fairness along with explainability, robustness, privacy and transparency. IBM’s AI Fairness 360 open-source toolkit examines bias in machine learning models throughout the AI application lifecycle. To see an example of a fairness metric in practice, check out Practitioner’s guide to Trustworthy AI.

#MachineLearning #Preprocessing #BiasInAI #WatsonOpenScale #CloudPakforData

--

--