Ethics for powerful algorithms (2 of 4)

This is the second of four articles on the ethics of powerful algorithms, taking COMPAS as a case study. Our story so far:

  • COMPAS is an algorithm used widely today to predict which criminals are most likely to commit future crimes.
  • Investigative journalists at ProPublica recently published a study claiming that COMPAS is “biased against blacks,” and shared their data as evidence.
  • When I replicated ProPublica’s analysis, I discovered that their own data proves that COMPAS isn’t biased — at least not in the statistical sense.

It’s time to take up the question of how powerful algorithms can be harmful or unfair, even if they’re not statistically biased. I’m going to toss three big problems on the table — distorted ground truth, differential cheating, and punishment for the actions of others — then come back to the ProPublica vs. COMPAS controversy around algorithmic bias.

1. What if ground truth is distorted?

In machine learning and statistics, we often talk about “ground truth”: how is the dependent variable defined and measured — exactly? Defining ground truth is an important validity check, because any bias or distortion in the underlying data will be baked into the algorithm itself.

In this case, COMPAS is intended to predict recidivism — the likelihood that a current offender will commit a future crime. Northpointe defines recidivism as “a finger-printable arrest involving a charge and a filing for any uniform crime reporting (UCR) code.” ProPublica interprets this criteria as any “new arrest within two years.”

Did you catch that? Strictly speaking, COMPAS does not predict future crimes; it predicts arrest for future crimes.

  • What if police officers are more likely to pursue, search, and arrest black suspects than white suspects?
  • What if law enforcement deploys a disproportionate amount of force or uses more aggressive policing tactics in black neighborhoods?
  • What if neighbors, store owners, or bystanders are more likely to report black suspects than white suspects?

There’s plenty of anecdotal evidence for all of these trends. (Ex: see the history and rulings behind “stop and frisk” policies.) They all directly affect ground truth for COMPAS. Should we trust an algorithm when we don’t believe that ground truth itself is fair?

2. What if some criminals can game the algorithm?

The COMPAS algorithm is based on a pre-trial survey of 137 questions. The first few sections are filled out from court records, including questions such as:

  • Was this person on probation or parole at the time of the current offense?
  • Is the current top charge felony property or fraud?

The last third of the survey consists of subjective questions answered by the offenders themselves. There are four categories: Social Isolation, Criminal Personality, Anger, and Criminal Attitudes.

Here’s a sample of statements from “Criminal Personality.” For each statement, the offender is instructed to give an answer between Strongly Agree and Strongly Disagree:

  • I am seen by others as cold and unfeeling.
  • I always practice what I preach.
  • The trouble with getting close to people is that they start making demands on you.
  • I have the ability to “sweet talk” people to get what I want.

And from “Criminal Attitudes”:

  • A hungry person has a right to steal.
  • If someone insults my friends, family or group they are asking for trouble.
  • The law doesn’t help average people.
  • Many people get into trouble or use drugs because society has given them no education, jobs or future.
  • Some people just don’t deserve any respect and should be treated like animals.

I don’t know the legal implications of lying on the COMPAS survey, but it’s hard to imagine a way to fully enforce honesty on these questions. In that case, it’s easy to see how lawyers might coach their clients (or gangs might coach their members) to answer the COMPAS questionnaire to reduce risk scores.

In the world of machine learning and cybersecurity, this kind of problem is known as adversarial learning. The most famous example is spam detection: trying to identify which emails are legitimate messages versus spam. Almost all algorithms are at least somewhat susceptible to adversarial manipulation. Of course, judges can be deceived, too, but perhaps not so systematically.

If all offenders learn how to do this, the algorithm loses all validity. If only some offenders learn, then it will be skewed against those without the know-how, connections, or resources to game the test.

3. Should I be punished for the behavior of others?

The COMPAS survey also includes batteries of questions about Family Criminality, Peers, and Social Environment:

  • If you lived with both parents and they later separated, how old were you at the time?
  • Was your father (or father figure who principally raised you) ever arrested, that you know of?
  • How many of your friends/acquaintances have ever been arrested?
  • How many of your friends/acquaintances are taking illegal drugs regularly (more than a couple times a month)?
  • In your neighborhood, have some of your friends or family been crime victims?

It’s safe to assume that these questions increase the predictive power of the model, and I can absolutely see how an accurate picture of an offender’s home, social, and economic circumstances would help judges and law enforcement better predict the risk of future crimes.

But is it just to leave a criminal in prison longer because his parents separated when he was young? Because his friends have been crime victims? Most people would say that it’s unfair to punish someone for the actions of others. To my mind, these questions cross that line.

COMPAS isn’t statistically biased, but it penalizes people for circumstances beyond their control.

So is the ProPublica team right?

These are serious criticisms. We could raise others (“accuracy of source data” and “validity of gender as a predictor” both come to mind) but for now, let’s come back to the main thesis: powerful algorithms can be harmful and unfair, even when they’re unbiased in a strictly technical sense.

Does that mean that the ProPublica team is right after all?

In my view, ProPublica is right to call attention to COMPAS, but they’ve done it in the wrong way. I can’t get on board with their tagline (“There’s software used across the country to predict future criminals. And it’s biased against blacks.”) when their own data shows that it isn’t true.

At the very least, ProPublica is guilty of muddying the water by using the word “bias.” Of course, bias has several meanings. But in the context of statistics it has a very specific definition, and ProPublica has attacked COMPAS almost entirely on statistical grounds using statistical methods. Under the statistical definition, COMPAS isn’t biased.

By misappropriating the term for a catchier headline, ProPublica pinned blame in the wrong place. As a result, most people who read their article probably drew the conclusion that COMPAS is a shoddily designed algorithm: “Those COMPAS people must be corrupt or incompetent. Otherwise they wouldn’t be peddling junk statistics.”

On closer examination, I’ve come to believe that COMPAS is a sophisticated, unbiased, and highly predictive risk assessment. One of Northpointe’s co-founders was a professor of statistics at the University of Colorado. We have no reason to doubt the integrity of the Northpointe team.

In spite of all that, COMPAS is still contributing to a process that could be deeply unfair.

Where next?

Crucifying one company — or even the whole criminal risk assessment industry — won’t solve this problem, because COMPAS’ “bias” has little to do with the algorithm itself. Instead, it’s rooted in a fundamental tension in values.

The core issue here isn’t good statistics versus bad statistics. It’s not even about due process and transparency. It’s about competing concepts of fairness in criminal justice — why we shut people in prisons to begin with.

In my next post, I’ll lay out those concepts and show how algorithms like COMPAS can further or frustrate those values.

PS: I haven’t been able to figure out the sourcing for the COMPAS questionnaire on DocumentCloud. Can anyone fill that in for the community?

PPS: In case I haven’t said it loudly enough, ProPublica has performed a huge public service by getting the data so that COMPAS can be subjected to scrutiny and replication.

PPPS to Julia and the team at ProPublica: If you’re reading this, I’d love to engage. I’m very sympathetic to your argument that algorithms in criminal justice need to be fair, transparent, and respect due process. But you’ve been barking up the wrong statistical tree.