From left: Triveni Gandhi (Dataiku), Jason Kodish (Capgemini) and William Thompson (Forbes Insights), with panel moderator Peter Maloof at the recent Future Labs Focus | AI event, supported by Capgemini Applied Innovation Exchange

AI bias: The crisis & solutions

What do we do to build fair, trustworthy technology when our tools are based on data from a biased world? The experts at the recent Future Labs Focus | AI event, supported by Capgemini AIE, have some ideas

Published in
7 min readAug 14, 2019

--

Algorithms powering machine learning tools are based on cold, hard data. This objective data is used to excavate patterns and create statistical models for predictions, brooking no interference from our messy human emotions. Such predictive tools might set prices, diagnose illnesses, hire employees, run the world — prejudice is conquered once and for all.

Great in theory, but something’s going wrong. The data, it transpires, is biased. And scaling out biased data — applying it to those prices, and diagnoses, and employees — can perpetuate biases that we’ve been struggling for decades to overcome.

So how to deal with data bias, and what’s it going to take to fix it? At Focus | AI: Breaking Bias on August 6, the NYU Tandon Future Labs, supported by Capgemini’s Applied Innovation Exchange, brought together data experts with different backgrounds — NYU Tandon professor Julia Stoyanovich, Microsoft’s Customer Success Director Jonn de Havilland, data scientists Jason Kodish (Capgemini) and Triveni Gandhi (Dataiku), Forbes Insights MD William Thompson, Accrete CEO/Founder Prashant Bhuyan, and Dataiku Sales Engineer Jesse Bishop — to find out.

The consequences of data bias

“I’m going to start with the bad news,” said Julia Stoyanovich, Assistant Professor at NYU Tandon School of Engineering and cofounder of Data, Responsibly, as she kicked off the event with an industry overview. “And then I’ll give you some more bad news as we progress.”

Professor Julia Stoyanovich presents recent examples of the life-changing consequences of data bias

Julia outlined how data is shaping our world. “We are using data to capture what has happened in the past to become smart in how we move the world forward.” But the past (like the present) has been fraught with bias, and data follows that lead. For example, in 2012, Staples was found offering higher discounts to shoppers in affluent neighborhoods. The office supplies chain claimed it adjusted pricing based on shoppers’ proximity to competing stores (among other factors), with absolutely no intention of discriminating against lower income customers. Nevertheless, the model had the power to reinforce patterns of bias and discrimination: affluent customer save money, lower income customers must travel farther or spend more for the same product.

The Staples incident is one of a number of recent revelations about historical and geographical data unintentionally leading to the perpetuation of social bias. Amazon was late to roll out same-day delivery in Boston’s Roxbury (centrally located historically black and Latino) neighborhood; COMPAS, a widely used tool to assess a defendant’s risk of committing more crimes, wrongly identified black defendants as future lawbreakers at almost twice the rate as white defendants, while white defendants were mislabeled as being at a low risk for committing future crimes more often than black defendants.

As she pointed out, the tools were attempting predictions and actions based on data that was simply reflecting the biases of the real world — of inherent relationships between race, geography, economics, and other factors.

Corporate responsibility to counter bias

When data can lead to devastation, frameworks for fair, trustworthy models must be set in place. John de Havilland, Microsoft’s Cloud Customer Success Director, highlighted the responsibility of corporations to approach the data they collect critically and with transparency. “Companies like Microsoft built the technology that enabled this,” John said, referring to AI technology that makes deep fake videos possible, “so we also have a responsibility to make sure we govern it, or to make a framework to model how we use it.”

John de Havilland outlines Microsoft’s practices for ethical AI systems

The twin pillars of using data responsibly, according to John, are transparency (making the models and systems implementing historical data accessible and understandable) and accountability (accepting responsibility for undesirable consequences).

Those who collect or analyze data, or create tools for it, must operate in a framework supported by these pillars, which are themselves based on users’ rights to fairness, reliability, security and inclusivity in their technology.

Mitigating bias

Data experts were on hand to discuss how transparency and accountability might be successfully implemented in their fields. Triveni Gandhi, Data Scientist at Dataiku, William Thompson, MD at Forbes Insights, and Jason Kodish, Group Global Data Lead at Capgemini discussed the possible solutions to data bias during a panel discussion moderated by Peter Maloof, Director, Capgemini AIE.

The panel agreed that creating a framework to “fix” models based on biased data is essential — patch fixes will yield nothing. But how to incentivize companies to change their policies? Will government regulation and monetary incentives be enough to reign in technology based on biased data? And if it does, will it happen soon enough? Peter Maloof pointed out that policy and regulation often lag behind emerging tech. “When policy starts to catch up, one of the things that’s the hallmark of the disruptive companies in the era in which we live is finding the gaps in policies.”

Jason Kodish also noted that lawmaking may not be as strong a force as one might like to believe. “Policy makers will always be behind,” he said. “Internal organization policy will be much stronger than government regulation. And don’t underestimate market forces.”

The panel of data experts fields a question from the audience

William Thompson agreed. Answering an audience question about whether the errors caused by data bias opened up commercial opportunities for new companies to root out misleading data, he said, “Absolutely. And ironically, the places where unbiased AI is going to create the most value are the places where human bias today is more rampant.” For example, in corporations, redistributing resources to teams that use them the best, rather than those that have historically been awarded them.

Ultimately, we need to ensure that the data, and the community that collects and interprets it, is inclusive, said Triveni Gandhi. “There is this cult of personality around data science,” she said. “We think, ‘We’re using data, and we’re using math, so it’s by nature aloof and rational.’ When in fact, everyone has assumptions they’re bringing to the table. Data science has that hard science vibe. But this is a much more touchy-feely subject than we’d like to think.”

Triveni works for Dataiku, a data science startup that is democratizing access to data. Data scientists may create biased models because they are unaware of existing biases, having never encountered them. And data scientists themselves are a rather non-diverse bunch, both in gender and race. In her overview, Julia said, “The data set doesn’t know what it doesn’t know” — and the same is true of humans. With such a homogenous field, Triveni points out, inclusivity is essential to avoiding biased models. This means training and hiring non-male, non-white data scientists, and also training non-data scientists to understand how data is collected and analyzed so that citizen scientists can more effectively combat discriminatory modeling.

Jesse Bishop presents Dataiku’s holistic approach to data products

Her colleague, Jesse Bishop, Sales Engineer at Dataiku, touched on their company’s focus on inclusivity during his startup presentation earlier in the evening. “Everyone has blind spots,” he said. “Data science now is an expert-driven approach, where a few people actually access the technology and deliver a result.” But Jesse noted that often, that small group with a say in how the data is analyzed and interpreted.

Prashant Bhuyan explains Accrete’s focus on the contexts from which data is collected

The technology itself will also have to improve. Prashant Bhuyan mentioned during the presentation his startup, Accrete, that AI needs the ability to scale human expertise. “At Accrete, we’re teaching machines to read so they can generate useful insights about the world,” he said, adding that generating insights is complicated. Machines have to understand the context of the information they collect. At Accrete, this manifests via ranking sources of information by reliability, awarding value to sources that make fewer mistakes.

But even with a better framework and better technology, bias can still exist. There will always be room for improvement. “We should never be so complacent as to say we’ve ‘solved’ bias,” Triveni said. “It is a continual process.”

Words and photos: Annie Brinich (anniebrinich@nyu.edu)

--

--

The Future Labs at NYU Tandon offer the businesses of tomorrow a network of innovation spaces and programs that support early stage startups in New York City.