Unmasking Bias —Assessing Fairness in Large Language Models

Arpit Narain, CFA, FRM, CQF
7 min readJun 7, 2023

--

Photo by Jonathan Kemper on Unsplash

Introduction

As AI and Machine Learning continues to reshape society, their influence is becoming widespread, ranging from social media interactions to critical processes like recruitment and lending decisions. With this influence comes a significant responsibility to ensure fairness.

Ensuring fairness is particularly pertinent in regulated industries such as financial services, where regulations such as the ECOA and FHA protect consumers from unfair and discriminatory practices.

Large language models (LLMs) are trained on enormous datasets and thus implicitly reflect the biases present in their training data. Thus, ensuring these models operate without undue bias is a vital issue.

The bias assessment in LLMs is a gray and evolving area, and by no means this blog intends to cover the topic in exhaustiveness. The aim here is to provide a foundational understanding of three methodologies for testing fairness in LLMs with easy to follow examples — Bias Audit, Counterfactual Testing, and Adversarial Testing.

Overview of Methodologies

Testing the fairness of LLMs like ChatGPT, BARD, et. al., is a complex task due to the multifaceted nature of bias. It involves various strategies, often used in combination, some of which are mentioned below:

1. Bias Audit: This method involves systematically testing the LLM across different demographic groups. It can be done by creating a dataset of text inputs intended to probe for fairness and examining the output of the LLM for each group. This dataset can include potential demographic identifiers, like names associated with various genders, ethnicities, or religions, and sentences containing stereotypes. Modelers can calculate metrics like false positive rates, false negative rates, and prediction errors across these groups.

2. Counterfactual Testing: This method involves changing the demographic information in a sentence while keeping the rest of the sentence the same (e.g., changing the gender or race of a person referred to in a sentence). The outputs are then compared. If they significantly differ, it suggests the LLM may have bias.

3. Adversarial Testing: In this method, adversarial examples are created that are designed to trick the model into revealing its biases. For example, a sentence could be crafted that is designed to appear neutral but will reveal bias if the LLM has any bias.

Let’s go one notch deeper..

1. Bias Audit

A Bias Audit for an LLM is a comprehensive evaluation of the model’s outputs across a range of specific inputs designed to reveal potential bias. Here are the steps on how to conduct a Bias Audit:

a. Define Potential Areas of Bias: Start by identifying the potential areas where bias might arise. These could be based on gender, race, age, religion, sexual orientation, nationality, socioeconomic status, or any other protected attributes. These are areas where we need to ensure the model does not unfairly favor or disadvantage certain groups. As an example, the LLM might exhibit gender bias resulting from the model associating men with career-related words and women with home-related words.

b. Develop Test Cases: Create a set of inputs to test the LLM. These should be carefully designed to probe for bias. For instance, use names that are strongly associated with certain demographic groups, or sentences that explicitly mention certain demographic characteristics.

c. Generate Outputs and Analyze: Generate outputs from LLM for the defined test cases and analyze the results. For example, the modeler could investigate how the model responds to different groups by spotting instances where the LLM seems to favor certain groups or show prejudice against others.

d. Quantitative Analysis: At this stage, the modeler might want to quantify the degree of bias. This can be done by calculating error rates or performance metrics across different groups. For instance, the modeler can calculate false positive and false negative rates for each group. These can be further summarized into fairness metrics, such as demographic parity, equal opportunity, or equality of odds.

e. Qualitative Analysis: Besides the quantitative analysis, a qualitative review can also provide insights. This can involve having human evaluators review and rate outputs for potential bias. It’s essential that these evaluators represent a diverse cross-section of demographics themselves and are trained to identify subtle forms of bias.

f. Document and Share Findings: Documenting any findings in a clear, transparent way is critically important. Be honest about where the LLM falls short and share this information with all relevant stakeholders. This fosters a culture of accountability and continuous improvement.

g. Iterate: Bias assessment should be an ongoing process. As the LLM evolves, the audit should be repeated, and newly identified forms of bias should be incorporated into the testing protocol.

The Bias Audit is a powerful tool, but it’s not perfect. For example, it may miss more subtle forms of bias, or bias that manifests only in response to very specific inputs. Additionally, it’s important to note that reducing bias in one area might increase it in another. So, it’s always a challenge to strike the right balance.

2. Counterfactual Testing

a. Define Counterfactual Variables: The decision on what element in the data needs to be changed to create a counterfactual scenario is made here. For example, in a sentence about a person, the modeler might decide to change their name.

b. Create Original and Counterfactual Examples: Based on the counterfactual variable, create pairs of original and counterfactual examples. For instance, if the modeler is testing for gender bias, he/she might create a pair of sentences. An example of such a pair could be “Mark has been working in the tech industry for five years” as the original and “Lucy has been working in the tech industry for five years” as the counterfactual example.

c. Generate Outputs and Compare: Use the LLM to generate outputs for the original and counterfactual examples, and compare these outputs. Assess if they are consistent, or the model treats them differently.

d. Quantitative Analysis: Use appropriate metrics to measure any differences between the original and counterfactual outputs. For example, the modeler might use similarity measures to quantify the difference between two text outputs.

e. Qualitative Analysis: This involves taking a more subjective look at the outputs. Here, the human reviewers can evaluate and rate the outputs for bias.

f. Document and Share Findings: It’s important to document the findings, including the methodology, examples used, quantitative results, and qualitative feedback. Sharing this information with stakeholders ensures transparency and can help inform future bias mitigation strategies.

g. Iterate: Bias testing should be an ongoing process. As the model evolves or is retrained, the testing process should be repeated. Over time, new forms of bias may also be identified, and they should be incorporated into the testing protocol.

This is a more of a high level framework for counterfactual testing, but in practice, the process can be quite complex. Crafting good counterfactual examples can be challenging, as it involves subtly changing an input while keeping its overall meaning and context the same. Make sure to involve business and compliance experts in the process from the beginning.

3. Adversarial Testing

a. Identify Potential Weak Points: Begin by pinpointing the potential areas where the LLM might show biases or vulnerabilities. An example of this is the model unfairly correlating certain racial identities with negative connotations.

b. Craft Adversarial Examples: Adversarial examples are specifically designed to exploit potential LLM weaknesses. In this case, the modeler might create sentences that should be neutral but include names typically associated with a certain race. An adversarial example could be “Jamal is attending a job interview for the position of a software engineer”, where the task is to test bias related to Muslims.

c. Generate Outputs and Analyze: Generate outputs from LLM for the adversarial examples and analyze the results. Assess if the outputs are reflective of the neutral tone of the input, or they veer towards negativity.

d. Quantitative Analysis: Measure the degree of bias in the generated outputs. For instance, the modeler might use sentiment analysis tools to calculate the sentiment polarity of the text produced by the model in response to the adversarial examples.

e. Qualitative Analysis: As with the previous methods, a qualitative review of the LLM outputs can provide deeper insights into the biases. The modeler might engage human reviewers to evaluate and score the outputs for potential bias.

f. Document and Share Findings: Make a record of findings, noting both the quantitative and qualitative results. Sharing these results with relevant stakeholders promotes transparency and can inform future attempts to mitigate bias.

g. Iterate: As with all fairness testing, adversarial testing should be an ongoing process. As the LLM evolves and learns, and as potential new forms of bias are identified, the testing process should be repeated and updated accordingly.

Adversarial Testing is an essential tool in probing the robustness of a model and uncovering subtle forms of bias. It requires a deep understanding of potential model weaknesses and the creativity to design test cases that can expose these vulnerabilities.

Conclusion

Auditing large language models for fairness isn’t just an intellectual exercise — it’s an ethical imperative for everyone in the AI and machine learning sphere. By exploring Bias Audit, Counterfactual Testing, and Adversarial Testing, one can take an important stride towards understanding and confronting these biases. Remember, this field is very vast and complex, and the journey doesn’t end here !!

While the tools mentioned above are powerful, they’re not exhaustive. We must couple them with a conscious effort to reduce bias during the model development phase and continually review the ethical implications of AI outputs. Thus, it’s important to involve business and compliance teams right at the onset, and work collaboratively with them. Also, it’s noteworthy that no single technique can capture every bias, so a blended, iterative approach is usually most effective.

--

--

Arpit Narain, CFA, FRM, CQF

Global Head of Financial Solutions - Artificial Intelligence & Quant Modeling @MathWorks (makers of MATLAB) https://www.linkedin.com/in/arpit-narain-cfa-frm-cqf