ProPublica’s Misleading Machine Bias

Llewellyn Jones
3 min readOct 6, 2020

A May 2016 report by ProPublica revealed a stark finding. A commonly used statistical tool used by criminal justice professionals to predict criminal recidivism and determine sentencing guidelines was inherently biased.

The program, Compas, supposedly slanted its predictions to assume African-American were more likely to commit another crime than Caucasians. Not just slightly, but substantially so. Their analysis accuses the software of labeling black defendants as future criminals at twice the rate as whites, and whites were more often labeled as low risk than black defendants.

But the numbers for those statistics appear to not be based on output of Compas’ model. In the notebook for ProPublica’s calculations (cells 50–55), the percentages are calculated using a number of values for whether the perpetrator recommitted a crime, how much time they spent in jail, and whether their score was valid–but not the actual value of their Compas score.

The table:

Here is the code used to calculate the false positive rate, which should include Compas’ predictions, but doesn’t:

from truth_tables import PeekyReader, Person, table, is_race, count, vtable, hightable, vhightable
from csv import DictReader

people = []
with open("./cox-parsed.csv") as f:
reader = PeekyReader(DictReader(f))
while True:
p = Person(reader)
if p.valid:
except StopIteration:

pop = list(filter(lambda i: ((i.recidivist == True and i.lifetime <= 730) or
i.lifetime > 730), list(filter(lambda x: x.score_valid, people))))
recid = list(filter(lambda i: i.recidivist == True and i.lifetime <= 730, pop))
rset = set(recid)
surv = [i for i in pop if i not in rset]

The output of ProPublica’s analysis code:

Black defendants
Low High
Survived 990 805 0.49
Recidivated 532 1369 0.51
Total: 3696.00
False positive rate: 44.85
False negative rate: 27.99
Specificity: 0.55
Sensitivity: 0.72
Prevalence: 0.51
PPV: 0.63
NPV: 0.65
LR+: 1.61
LR-: 0.51
That number is higher for African Americans at 44.85%.

My own analysis of the Compas data shows relatively consistent accuracy for each race defined, between 59% and 67%. Asians and Native Americans are an exception as there are few data points to work with:

Compas Prediction AccuracyOther  66%
AA 59%
W 63%
H 65%
NA 66%
Asian 75%

While there are some inconsistencies, predictions for African-Americans are both accurate and precise, more so than for other groups.

Compas Prediction Metricsrace	precision  recall  f1
Other 24% 57% 34%
AA 62% 65% 64%
W 40% 64% 49%
H 32% 54% 40%
NA 90% 75% 81%
Asian 55% 71% 62%

A common measurement of “fairness” in analytics is predictive parity — if the total number predicted for a sensitive attribute like race aligns with actual totals. That is, the percentage of African Americans predicted to recommit crimes is similar to the actual percentage of African Americans who commit crimes.

Even if the model is inaccurate for certain populations, predictive parity shows that it’s not erroneously predicting certain populations to be criminals far beyond what they are in reality.

And the the predictive parity for African Americans is close — within 2% of actual rates. More so than for other groups. Which is exactly what Northpointe, the maker of Compas software, claimed in their defense of the program.

Compas Prediction ParityRace    Predicted recidivism rate    Actual recidivism rate 
AA 49% 51%
W 25% 39%

One point about the ProPublica criticism did appear correct. Caucasians and Hispanics are predicted to recommit crimes at a substantially lower rate than they actually do.

For Caucasians, they are predicted to recommit 25% of the time, but they actually recommit 39% of the time. And the prediction accuracy for both groups are substantially lower than for African Americans.

But the higher accuracy and precision in the African American population undercuts the idea that they are being unfairly biased.