When machine learning acts in bad faith

Are we making the world a better place or a more racist one?

Parker T.
The Mixpanel Blog
4 min readJun 1, 2016

--

Last week, ProPublica published an extensive article on the growing use of machine learning in prison-sentencing. The report, written by Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner, explores the adverse effects of entrusting our judgment to an algorithm.

Why is such a sci-fi twist even on the table? Machine learning promises to capture more information with greater neutrality. It’s able to predict behavior and outcomes, without the fog of human judgment.

Or so goes the underlying assumption of a company like Northpointe. Northpointe sells algorithms that aid judges in sentencing decisions, and while it isn’t the only mover in tech to take this approach, it may arrive as the most worrisome. The algorithm’s risk score is based on factors such as education level, employment, parents’ criminal record, etc.

Going off of ProPublica’s report, it would appear this type of risk-scoring doesn’t remove the element of human error, but instead exacerbates it. The traits it associates with recidivism target a specific group: Northpointe’s algorithm penalizes poor people of color for being poor people of color.

Furthermore, the risk factors Northpointe believes to be determinate are incorrect:

Source: ProPublic analysis of data from Broward County, Fla.

And yet the founder of Northpointe denies this, saying the algorithm, in spite of its baggage, works:

Brennan said it is difficult to construct a score that doesn’t include items that can be correlated with race — such as poverty, joblessness and social marginalization. “If those are omitted from your risk assessment, accuracy goes down,” he said.

Accuracy is a separate question. Northpointe’s own reports grade their algorithms as having a 68% accuracy rate. That hit rate feels pretty low, however, as a premise to change the course of someone’s life.

If this is the kind of progress we can expect from computers, is it even worth introducing the power of machine learning to our toughest social problems?

The answer is: Maybe, actually.

With the right oversight, it’s possible to designate certain issues as responsive to machine learning, and others as off-limits.

A nonprofit called Bayes Impact also tried to engineer algorithms for prison-sentencing. Based in San Francisco, Bayes has built machine-learning software and predictive models for tackling some of the world’s thorniest problems: fraud detection, police transparency, unemployment, and more.

Everett Wetchler (right) works alongside a colleague at Bayes Impact. Image Source: Bayes Impact

But Bayes has learned some crucial lessons. There’s more to choosing a good candidate for algorithmic rescue than simply selecting an issue where human judgment has faltered. Data science has to complement human ingenuity to make ethical solutions. It needs to do more than become the blameless substitute for perpetuating bad ethics.

Still, it’s not so strange that Bayes and Northpointe, from opposite sides of the profit line, would chase the same solution.

There’s clearly something wrong with prison sentencing, a dysfunction brokered by years of mishandled legislature. Although the U.S. is 5% of the world’s population, it houses 25% of the world’s prisoners. There’s a racial bias, with African-Americans being incarcerated at nearly six times the rate of whites. It’s ineffective, too. Within three years of release, 76.6% of former convicts are rearrested.

Unlike Northpointe, Bayes wasn’t motivated by the for-profit benefits of this problem. The folks at Bayes believed that algorithmic sentencing could potentially save years of prisoners’ lives and conserve taxpayer money. But, ultimately, it was too ethically fraught. There was too great a risk of algorithmic injustice.

Here’s Everett Wetchler, CTO of Bayes Impact:

“The idea that one person would get more time in jail than another, based on something other than their own criminal history, feels wrong to me. I don’t want to contribute to whatever that is, even if it objectively decreases some number of crimes committed.”

ProPublica’s data suggests the number of crimes committed doesn’t decrease when algorithms are involved in sentencing. And even if they did, this would be too simple a metric for the scope of this problem. Assuming risk-scoring brought the numbers of recidivism down, as Brennan says, Bayes’ rejoinder might be: At what cost?

Part of data science is product design, and the question behind Bayes’ product design is, Are people better off? The nonprofit was built around the idea that impact should come before functionality, lest software colonize the places it wants to disrupt.

Data-for-good firms shouldn’t shy away from hard problems on account of difficulty itself. But they also shouldn’t create solutions that aim to treat a metric and inadvertently worsen a problem.

Computers might be able to fare better than the icky prejudices of people, but not without serious oversight. Northpointe gives the illusion of neutrality, but its developers haven’t questioned the integrity of their data sets. With stakes this high, developers and data scientists should empathize and think ahead to the ultimate cost.

I’m no futurist, but I think the sweet spot for decision-making is a balanced computer-human synthesis. Our best ethics probably lie at the intersection of data impartiality and human conscientiousness. If we put our faith in a metric moving up or down, let us too have the scope to see the people behind the data points.

If you’re interested in do-gooderism and data, you can read more about the conscience of Bayes Impact here.

--

--