How to Improve AI Fairness in an Unfair World

AI-human collaboration needs fine-tuning — and it’s not all about the machines. Will humans learn to accept AI decisions?

MIT IDE
MIT Initiative on the Digital Economy

--

By Peter Krass

Designers of artificial intelligence (AI) systems are still figuring out how to emulate — and improve on — human decision-making. One big challenge: how to support human decision-making while also maintaining fairness and objectivity.

That was the topic of a recent MIT Initiative for the Digital Economy (IDE) seminar presented by Hamsa Bastani.

Bastani is an associate professor of operations, information and decisions at The Wharton School, where she also co-directs the Wharton Healthcare Analytics Lab. In addition, Bastani is co-author of a recent research paper on rethinking fairness for human-AI collaboration.

“At the end of the day,” Bastani told IDE attendees, “an algorithm is only as good as the data you feed it.”

But, she added, humans also play a big role in whether they accept AI recommendations.

AI Assistance in Medical, Legal Matters

AI can clearly be used to help humans make decisions involving complex sets of data. That might include helping a judge set a defendant’s bail; suggesting an interpretation of an X-ray, MRI or other imaging test to a radiologist; and supporting a physician who predicts a health risk.

Fairness, however, is something of a moving target. As Bastani explained, research has identified several areas where human decisions are commonly unfair and even discriminatory. These include credit scores, health risk predictions, even criminal recidivism. While AI algorithms can help, they’re not uniformly fair, either — especially if the data they use is biased. Some AI tools offer suggestions that are more balanced, but others do not, leading to further disparities.

For example, Bastani cited a resource-allocation problem faced by the Greek government during the start of the COVID-19 pandemic in 2020. At the time, the country was receiving up to 100,000 tourists a day, but due to supply-chain constraints, it had only enough COVID tests for about 8,000 people.

To determine the most effect way of allocating this limited number of test, the government used an AI algorithm. However, the AI system proved fallible. When one of the COVID testing labs mistakenly submitted eight copies of the same positive test, the AI system interpreted this as a new risk and suggested further testing. The error was discovered only after human experts intervened.

Fighting ‘Selective Compliance’

Further, when humans and AI collaborate, another layer of complexity can occur when an algorithm makes a decision that’s fair, but humans ignore or override its advice.

This phenomenon, known as “selective compliance,” can result in end-to-end outcomes and policies that are unfair, even though the AI’s suggestion was fair.

Typically, there’s no way to predict human compliance, though researchers have tried to develop algorithms that are both compliance-aware and high performing. The tensions between the two are strongly evident. The paper illustrates the perils of selective compliance for equitable outcomes in human-AI collaboration.

For example, while many AI medical models are more accurate than human radiologists, as Bastani pointed out, AI assistance has failed to improve average diagnostic quality in the field. The cause? Selective compliance. If a radiologist’s “gut” tells them the AI is wrong, they can override or ignore the suggested diagnosis, even though the AI’s suggestion is both accurate and fair.

A similar situation can be found in the field of law, where the early use of AI analytics have delivered no benefit to sentencing and bail decisions by judges.

Research conducted in Virginia found that even when AI suggestions were impartial, judges could and did override the AI guidance in ways that were prejudiced.

Specifically, judges were more likely to be lenient with white defendants, and less lenient with those who were Black, even when their comparative risk scores were the same, Bastani said.

In a related study, MIT researchers described a phenomenon known as “human favoritism.” Researchers found that humans generally perceive AI-generated content as being of higher quality than content generated by humans. However, given a choice, humans still favor content generated by other humans. Other MIT research has also focused on making a machine-learning model’s predictions more accurate and fair.

Fomenting Fairness

So what can be done to help AI lead to fairer decisions? Part of the solution, Bastani suggested, could come from AI developers. They can tune their AI algorithms to account for human policy — that is, the actions a human would take in the absence of an AI tool. This data exists in the form of historical records, such as past court cases and medical diagnoses, and it would add context and nuance to the raw AI analysis.

Equality of opportunity needs to be factored in to decisions, too, Bastani said. In other words, if a person is qualified for an opportunity, they should receive it.

Bastani also called for “compliance-robust fairness.” Essentially, this means that an AI decision-support algorithm should never lead to a less fair outcome regardless of how the human user chooses to comply. “What we really want to do,” Bastani said, “is to ensure that [AI] is not more unfair than the human policy.”

Ultimately, fairness may look very different in human-AI collaboration than it does in traditional human decision-making,

Bastani said, and tradeoffs may be needed: “We may need to forgo [concepts about] traditional fairness to improve end-to-end outcomes.”

Do more:

· Learn more about the MIT Initiative on the Digital Economy.

Peter Krass is a contributing writer and editor to the MIT IDE.

--

--

MIT IDE
MIT Initiative on the Digital Economy

Addressing one of the most critical issues of our time: the impact of digital technology on businesses, the economy, and society.