Created by Author

Developing Maturity Model Assessment Metrics for the Application of AI Ethical Principles (Part 1)

Ashley Moore
10 min readAug 15, 2023

--

Derived from IEEE Trustworthy AI — Part 1 Article (Ref 1)

Recent news (i.e. July 2023 headlines) discusses President Joe Biden has met with some top AI Companies (i.e. Amazon, Anthropic, Google, Inflection, Meta, Microsoft and OpenAI) to discuss 3rd party testing and product watermarking. Now understand that a few of these companies have been active members of the www.ai.gov National Artificial Intelligence Advisory Committee (NAIAC). Now, we’re going to have to assume that a few of these companies are skilled at adopting Frameworks, Consensus Based Standards, and developing AI software that is unbiased. A few of these members companies actually have published Auditing IT and Software Playbooks — so the questions are; Do they actually apply them — walk what you talk, and are they capable of auditing themselves?

There are many pitfalls to marketplace self-regulation and internal auditing; 1). It’s impossible to remain bias, 2). poor communication between internal auditors and business unit being audited, 3). Internal auditors become opinionated in the application of various standards of care which leads to bias in their results, and 4). Self-regulation always leads to conflict-of-interest issues.

Introduction and Context

We need to put a few things in context before we dive into the pitfalls of AI Bias by design and how to assess whether the system does or doesn't have characteristic attributes of bias.

  • The OECD AI Policy Observatory (OECD.AI) builds on the momentum of the OECD’s Recommendation on Artificial Intelligence (“OECD AI Principles”) — the first intergovernmental standard on AI — adopted in May 2019 by OECD countries (which includes the U.S.) and adhered to by range of partner economies. The OECD AI Principles provided the basis for the G20 AI Principles endorsed by Leaders in June 2019. (Ref 2)
Copy Image from oecd.ai website
  • American Bar Association: “At present, the regulation of AI in the United States is still in its early stages, and there is no comprehensive federal legislation dedicated solely to AI regulation. However, there are existing laws and regulations that touch upon certain aspects of AI, such as privacy, security and anti-discrimination”. ~ American Bar Association (Ref 3)
  • Federal Trade Commission (FTC) — Bias and discrimination: In addition to inherent design flaws, AI tools can reflect biases of its developers that lead to faulty and potentially illegal outcomes. The report provides analysis as to why AI tools produce unfair or biased results. It also includes examples of instances in which AI tools resulted in discrimination against protected classes of people or over blocked content in ways that can serve to reduce freedom of expression. (Ref 4)

A Path Forward

In this article I will cover AI Bias Metrics based on the use of IEEE Std P7003, ISO/IEC 24027 and NIST Special Publication — SP1270 Towards a Standard for Identifying and Managing Bias in Artificial Intelligence.

First, we’re going to define what an Algorithmic system is: An Algorithmic system(s) refers to the combination of algorithms, data and the output deployment process that together determine the outcomes that affect end users. Secondly, there’s the construct for the conversation on AI Biase (i.e. Unjustified and Inappropriate Bias). Unjustified bias refers to differential treatment of individuals based on criteria for which no operational justification is given. Inappropriate bias refers to bias that is legally or morally unacceptable within the social context where the system is used, e.g. algorithmic systems that produce outcomes with differential impact strongly correlated with protected characteristics (such as race, gender, sexuality, etc.) (Ref 5).

ISO/IEC 24027 views, bias as something that can manifest from a large variety of sources. Some people believe it is as simple as ensuring datasets are diversely sourced, but there are many nuances to this. Bias can originate from human sources, such as labelling decisions made by crowd-sourced workers. It can also be a result of engineering decisions about how multiple components interact with each other, or decisions about how data is prepared.

Another issue is that some data may be legally protected from being collected for privacy or other reasons. This can make it difficult to see whether people with such protected characteristics are being treated fairly or not. Bias resulting from AI systems affects people differently, depending on the context. These effects are described in the ISO/IEC Technical Report as (Ref 6):

  • Positive: For instance, an AI system for hiring can introduce a bias towards one gender over another in the decision phase to compensate for societal bias inherited from the data, which reflects certain historical underrepresentation in a profession.
  • Neutral: For example, the AI system for processing images for a self-driving car system can systematically misclassify mailboxes as fire hydrants. This statistical bias will only have neutral impact if the system has an equally strong preference for avoiding each type of obstacle.
  • Negative: For instance, AI hiring systems favoring candidates of one gender over another and voice-based digital assistants failing to recognize people with speech impairments can have unintended consequences of limiting the opportunities of those affected. Such examples can be categorized as unethical and compromise the trustworthiness of the AI-based system. (Ref 7)

Steps to Fixing Bias in AI Systems:

  1. You should fully understand the algorithm and data to assess where the risk of unfairness is high.
  2. You should establish a debiasing strategy that contains a portfolio of technical, operational and organizational actions:
  • Technical strategy involves tools that can help you identify potential sources of bias and reveal the traits in the data that affects the accuracy of the model.
  • Operational strategies include improving data collection processes using internal “red teams” and third party auditors. You can find more practices from Google AI’s research on fairness.
  • Organizational strategy includes establishing a workplace where metrics and processes are transparently presented.

3. As you identify biases in training data, you should consider how human-driven processes might be improved. Model building and evaluation can highlight biases that have gone noticed for a long time. In the process of building AI models, companies can identify these biases and use this knowledge to understand the reasons for bias. Through training, process design and cultural changes, companies can improve the actual process to reduce bias.

4. Decide on use cases where automated decision making should be preferred and when humans should be involved.

5. Research and development are key to minimizing the bias in data sets and algorithms. Eliminating bias is a multidisciplinary strategy that consists of ethicists, social scientists, and experts who best understand the nuances of each application area in the process. Therefore, companies should seek to include such experts in their AI projects.

6. Diversity in the AI community eases the identification of biases. People that first notice bias issues are mostly users who are from that specific minority community. Therefore, maintaining a diverse AI team can help you mitigate unwanted AI biases.

Copied form the document by Murat Durmus

To capture the full depth of the issue, Murat Durmus has put together a brief on the 160 cognitive biases we need to be aware of when screening and labeling data for training and testing any AI system. In one frame of reference, if the organization’s current IT/IS are already structured in a was to represent their operational bias. So, converting to the use of AI and using the data from existing systems, “the system is only going to operate with the same level of bias” — except faster! There are other factors to consider, Human in the Loop (HITL); during the quality management process the HITL can demonstrate their own bias during data labeling/processing — (as illustrated below).

Graphic Created by Author

The vast majority of entities creating, selling and using AI capabilities have not considered the environmental, social and governance (ESG) impact of their actions. As we see every day in the news and in “global litigation” cases, AI has far reaching impact, from; employment and content development, system(s) security and data privacy, diversity and inclusion. The emergence of generative AI (i.e. ChatGPT) has struck a cord with concerns over jobs, workforce displacement, the potential for adversarial subversion and misuse (e.g. political campaigning), among other National Security concerns.

Manual Approach: Developing an AI Bias Checklist leading to Maturity Model Metrics

There are 3-steps in the process to consider:

(1) Understand background of the predictive task, which defines the disadvantaged groups and the types of biases and disparities of concern,

(2) Identify algorithm and validation evidence, and

(3) Use checklist questions to identify potential biases.

In response to these developing concerns, several reporting guidelines have been published to help researchers uncover potential issues in studies using prediction models.(Ref 9,10) Researchers have also proposed mathematical definitions of bias,(Ref 11–13) describing methods for measuring bias,(Ref 14–17) and offering approaches for mitigating bias.(Ref 15,18,19) While the development of these resources has been undoubtedly useful, they are limited in their comprehensiveness. For example, some frameworks assess only one element of algorithmic bias (eg, model training or optimization), (Ref 20–22) while others only assess specific types of biases. (Ref 17,23,24)

Manual vs. Automated Tools to Reduce Bias
AI Fairness 360
IBM released an open-source library to detect and mitigate biases in unsupervised learning algorithms that has currently 34 contributors (as of September 2020) on Github. The library is called AI Fairness 360 and it enables AI programmers to:

  • test biases in models and datasets with a comprehensive set of metrics.
  • mitigate biases with the help of 12 packaged algorithms such as Learning Fair Representations, Reject Option Classification, Disparate Impact Remover.

However, AI Fairness 360’s bias detection and mitigation algorithms are designed for binary classification problems that’s why it needs to be extended to multiclass and regression problems if your problem is more complex.

IBM Watson OpenScale
IBM’s Watson OpenScale performs bias checking and mitigation in real time when AI is making its decisions.

Conclusion:

MM metrics will be added in Part 3 of this series. Suggestions and recommendations welcome.

Reference:

  1. IEEE Trustworthy AI — Part 1 (Link: IEEE Xplore Full-Text PDF: )
  2. OECD AI Policy Observatory (OECD.AI) www.oecd.ai
  3. https://www.americanbar.org/groups/journal/podcast/what-could-ai-regulation-in-the-us-look-like/#:~:text=%E2%80%9CAt%20present%2C%20the%20regulation%20of,%2C%20security%20and%20anti%2Ddiscrimination.
  4. IEEE Std P7003 Algorithmic Bias Considerations (Link: IEEE SA — P7003 )
  5. NIST Special Publication 1270 Towards a Standard for Identifying and Managing Bias in Artificial Intelligence (Link: Towards a Standard for Identifying and Managing Bias in Artificial Intelligence (nist.gov) )
  6. Work in progress paper: IEEE P7003TM Standard for Algorithmic Bias Considerations (Link: IEEE P7003TM Standard for Algorithmic Bias Considerations (umass.edu) )
  7. ISO/IEC TR 24027:2021 | IEC Webstore
  8. Standards help address bias in artificial intelligence technologies | IEC e-tech
  9. Wolff RF, Moons KGM, Riley RD, et al.; PROBAST Group. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med 2019; 170 (1): 51–8. [PubMed] [Google Scholar]
  10. Liu X, Cruz Rivera S, Moher D, et al.; SPIRIT-AI and CONSORT-AI Working Group. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat Med 2020; 26 (9): 1364–74. [PMC free article] [PubMed] [Google Scholar]
  11. Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A. A Survey on Bias and Fairness in Machine Learning. ACM Comput Surv 2021; 54 (6): 1–35. 10.1145/3457607 [CrossRef] [Google Scholar]
  12. Verma S, Rubin J. Fairness definitions explained. In: Proceedings of the International Workshop on Software Fairness. Gothenburg, Sweden, May 29, 2018:1–7.
  13. Chouldechova A, Roth A. A Snapshot of the frontiers of fairness in machine learning. Communications of the ACM, 2020, 63 (5): 82–89 [Google Scholar]
  14. Berk R, Heidari H, Jabbari S, et al. A convex framework for fair regression. arXiv:1706.02409. 2017.
  15. Zafar M, Valera I, Gomez Rodriguez M, et al. Fairness constraints: mechanisms for fair classification. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS). Ft. Lauderdale, FL, USA, 2017.
  16. Komiyama J, Takeda A, Honda J, et al. Nonconvex optimization for regression with fairness constraints. In: Proceedings of the 35 th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018.
  17. Corbett-Davies S, Goel S. The measure and mismeasure of fairness: a critical review of fair machine learning. arXiv:1808.00023. 2018.
  18. Zhang B, Lemoine B, Mitchell M. Mitigating unwanted biases with adversarial learning. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. New Orleans LA USA, December 27, 2018:335–340.
  19. Kamishima T, Akaho S, Sakuma J. Fairness-aware learning through regularization approach In: 2011 IEEE 11th International Conference on Data Mining Workshops. Vancouver, Canada, 2011:643–650.
  20. Bellamy RKE, Dey K, Hind M, et al. AI fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. arXiv:1810.01943. 2018.
  21. Agarwal A, Beygelzimer A, Dudík M, et al. A reductions approach to fair classification. In Proceedings of FAT ML, Halifax, Nova Scotia, Canada, 2017.
  22. Barda N, Yona G, Rothblum GN, et al. Addressing bias in prediction models by improving subpopulation calibration. J Am Med Inform Assoc 2021; 28 (3): 549–58. [PMC free article] [PubMed] [Google Scholar]
  23. Hutchinson B, Mitchell M. 50 years of test (un)fairness. In: Proceedings of the Conference on Fairness, Accountability, and Transparency. Atlanta, GA, USA, January 29, 2019:49–58.
  24. Glymour B, Herington J. Measuring the biases that matter. In: Proceedings of the Conference on Fairness, Accountability, and Transparency. Atlanta, GA, USA, January 29, 2019:269–278.

--

--

Ashley Moore

Retired Executive Branch Department & Agency Director (Risk Management, Prevention/Preparedness, Contingency Planning, Anti/Counter Ops, Influence Ops, & AI)