Counterfactual Fairness and IBM Watson OpenScale

Published in

Trusted AI

4 min readJan 8, 2021

(Co-authored by Manish Bhide and Ravi Chamarthy)

There are multiple metrics and associated techniques to measure fairness in AI Models. One such metric is Counterfactual fairness. In this blog post, we explain how the fairness detection technique in IBM Watson OpenScale also helps to detect counterfactual fairness in AI Models.

Issue with Fairness Metrics

Consider a scenario where a bank is using an AI model to decide if a person who is applying for a loan should be grated the loan or not. The bank would like to ensure that its AI model acts in a fair manner against people of all ethnicities, genders and age groups. Metrics such as Disparate impact ratio can be used for this purpose. Disparate impact ratio compares the percentage of loan approved outcomes for (say) different age groups and signals a bias if there is a big difference for different age groups. E.g., if people in the age group [26,90] get a loan approved outcome 60% of the times whereas people in the age group [18,25] get a loan approved outcome only 20% of the times, then such a model will be flagged as acting in a biased manner.

However, there is a problem with the above approach. Consider a scenario where all the people in the age group [18,25] have a poor credit rating. The AI model is then bound to give a loan denied outcome to such applicants. Hence the loan approved outcome for people in the age group [18,25] will be 0% and the model will be said to act in a highly biased manner as per Disparate Impact ratio. However in reality, the model is making the right decision and is not acting in a biased manner. This problem is addressed by Counterfactual fairness.

Counterfactual fairness

The AI model mentioned earlier is said to be Counterfactually fair if it gives the same prediction had the person had a different race/gender or age group. Many a times model developers do not make use of attributes such as race/gender when building an AI model. So if such attributes are not used as features of the model, how do we figure out the model behavior when the gender/race is changed? Counterfactual fairness addressed this problem as follows:

- Step 1: The first thing needed for computing Counterfactual fairness is to identify relationships between fairness attributes (such as race, gender, age, etc.) and one or more features of the model. E.g., Race is correlated with Zip code and Income. This is a manual step and needs to be done by domain experts.

- Step 2: Consider that an African American person applied for a loan and the model predicted that the Loan should be denied. In order to find Counterfactual fairness, we assume that this person was from a different race.

- Step 3: We then find the values of all the correlated features which have a correlation with race. This way we will find the new values of these features when the race is changed from African American to say Caucasian. This new record is sent to the model and if the model prediction changes to say Loan approved, then the model is said to be Counterfactually unfair.

There are two key things which we would like to highlight here: (1) There is a manual effort involved in finding relationships between fairness attributes and one or more features (in Step 1 above) and (2) It computes individual bias. In other words, if the model exhibits bias for a single person, then it is said to be biased.

Fairness in Watson OpenScale

IBM Watson OpenScale makes use of a data perturbation based fairness detection technique which is similar to what is done in Step 2 and 3 of Counterfactual fairness method mentioned above. The details of these techniques are listed in our earlier blogs on fairness and indirect bias. There are two major improvements that OpenScale supports:

- OpenScale helps to automatically identify the relationships between the fairness attributes and one or more features of the model. This avoids the need to do this manually which is error prone.

- OpenScale makes use of a technique which is a hybrid of individual + group fairness. The details of this technique are given below.

Group + Individual Fairness

Consider a scenario where an AI model has made predictions on 100 records. In the hybrid technique, OpenScale first computes individual fairness by perturbing multiple records (from the 100 records) sent to the model. In other words if an African American person has applied for a loan, OpenScale will flip the race to say Caucasian and also change the correlated features (such as zip code, income level, etc.). This perturbed record is sent to the model and the model output is stored. Such perturbation is done for multiple records.

In the second step, we compute the group fairness, over the combination of the original records received by the model (100 in the above example) and the perturbed records generated in the previous step (which could be close to 100 or more). This step helps us understand the overall behavior of the model as opposed to flagging fairness even if a single record is not counterfactually fair. This is especially useful in an enterprise setting where it is difficult to build a model which is 100% counterfactually fair and what they really need is a way to understand the overall model behavior. Having said that, if an enterprise wants to ensure that their AI model is fair on each and every record (in other words the model is counterfactually fair), OpenScale supports that as well.

Thus the fairness detection technique in IBM Watson OpenScale builds on top of Counterfactual fairness and is especially suited for use by enterprises.

Counterfactual Fairness and IBM Watson OpenScale

Written by Manish Bhide