Bias detectives

HB
HB
Nov 6 · 6 min read

In my last blog, I touched very briefly on the fact that I’m interested in exploring biases which can affect the models we create and the results we achieve. I’m also interested in ethics and issues around transparency, and being able to investigate algorithms to see how they are producing their results.

Whilst researching survivorship bias, I came across discussions about what it means for an algorithm to be ‘fair’, which I thought was an interesting concept to explore.

COMPAS is a tool developed by the American company Northpointe. It uses an algorithm to assign defendants scores from 1 to 10 that (claim to) indicate the probability of a defendant reoffending within two years. The higher the score, the higher an individual is deemed to be at risk of reoffending. The score is derived from 137 questions that are either answered directly by defendants or filled in using their criminal records. The questions include “Was one of your parents ever sent to prison?”, “How many of your friends/acquaintances are taking drugs illegally?”, and statements which an individual has to agree or disagree with (for example, “A hungry person has a right to steal”).

COMPAS and the scores it produces has been used in different states across the US since 2000 to inform decisions at every stage of the criminal justice system. A score might be used to assign a bond amount to a suspect before their trial, or used by a judge to decide how long an individual’s prison sentence should be.

In 2016, a team of investigative journalists at ProPublica decided to check how accurate COMPAS is. They used public records requests to obtain the risk scores assigned to over 7000 people in Florida in 2013 and 2014 and checked to see how many of these individuals were actually charged with new crimes over the next two years (the same benchmark used by COMPAS).

ProPublica found that:

1) COMPAS scores were extremely unreliable for predicting violent crime: only 20% of the people predicted to commit violent crimes actually went on to do so

2) COMPAS scores were more accurate for predicting a full range of crimes (including driving without a licence and other misdemeanours): 61% of people predicted to commit minor offences actually went on to do so

3) the algorithm was biased against African Americans, as can be seen from the table below.

The table above shows that:

  • African American defendants were predicted to be at a higher risk of reoffending than they actually were (in fact, the analysis found that African American defendants who did not reoffend were nearly twice as likely to be misclassified than their white counterparts)
  • White defendants were often predicted to be less at risk of reoffending than they actually were (the analysis found that white defendants who reoffended within the next two years were mistakenly labeled low risk twice as often as African American reoffenders).

ProPublica went further: their analysis showed that even when controlling for prior crimes, future reoffending, age and gender, African American defendants were still 45% more likely to be assigned higher risk scores than white defendants.

Since ProPublica’s analysis in 2016, Northpointe and ProPublica have been locked in a debate about whether or not COMPAS is “fair”.

Northpointe contends that COMPAS is fair because it demonstrates predictive parity: it correctly predicts the reoffending rate for African American and white defendants at roughly the same rate. For example, among defendants who scored a 7 on the COMPAS scale, 60% of white defendants reoffended, which is nearly identical to the 61% of African Americans who reoffended.

However, ProPublica focuses on their findings with respect to defendants who did not go on to reoffend (see table above). They argue that a fair algorithm can’t make these serious errors more frequently for one group than for another — in a fair algorithm, the error rate should be the same for both groups.

Several other teams of researchers have also carried out their own analyses of Northpointe’s model. One team found that the model was no better at predicting reoffending rates than random volunteers recruited from the internet, whilst another proved that the two definitions of fairness provided by Northpointe and Propublica are mutually exclusive.

Northpointe has refused to disclose the details of its algorithms and the calculations it uses to arrive at defendants’ risk scores, meaning that it is impossible to see what is causing the bias that ProPublica picked up on.

Without being able to analyse their algorithm in detail, there are nonetheless wider issues that can be explored when trying to understand the bias that COMPAS is demonstrating. For example:

  • the people on whom COMPAS is being used are only defendants who have been arrested by police (which means that criminals who have not been arrested are not accounted for — this is an example of survivorship bias)
  • the people being arrested are more likely to be African American than white because of individual and institutional racism (see ‘13th’, a film on Netflix, as a good starting point to explore this)
  • the questions used to inform the scores produced by the algorithm are problematic (for example, asking an individual if they have been previously arrested doesn’t account for the fact that racism may have been at play in their arrest).

There have been moves by various Governments to make the software they are producing and using more accountable. For example, Macron announced last year that France will make all algorithms used by its government open to scrutiny, which will force its data scientists to think very carefully about the work they produce and be open to improving it.

As aspiring data scientists, we want to be thinking about how the data we are using could be biased and how we can account for and/or avoid bias in the models we are constructing, which is a very difficult thing to do. As Kate Crawford, a researcher at Microsoft puts it, “You can tell a human operator to try to take into account the way in which data is itself a representation of human history. How do you train a machine to do that?”

Sources:

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade