How Recidivism Risk Assessments Impede Justice

These tools illustrate the kinds of insidious algorithmic harms that rarely make headlines

Published in

Data & Society: Points

7 min readApr 12, 2023

Amid the chaos of the early months of the pandemic, criminal courts in Pennsylvania were instructed to begin consulting the Sentence Risk Assessment Instrument when sentencing crimes. The actuarial tool uses demographic factors like age and number of prior convictions to estimate the risk that an individual will “reoffend and be a threat to society” — that is, be reconvicted within three years of release from prison.

The instrument was developed on the premise that it would help judges identify candidates for alternative sentences, with the ultimate aim of reducing the prison population. But in interviews with criminal court judges and other criminal legal bureaucrats throughout the state, I found that that has not happened. In fact, judges routinely ignored the tool’s recommendations, which they disparaged as “useless,” “worthless,” “boring,” “a waste of time,” “a non-thing,” and simply “not helpful.” Others weren’t even aware that their courtrooms were supposed to be using it.

Recidivism risk assessment instruments are used in high stakes pre-trial, sentencing, or parole decisions in nearly every US state. These algorithmic decision-making systems, which infer a defendant’s recidivism risk based on past rearrest data, are often presented as an “evidence-based” strategy for progressive judicial reform — a way to reduce human bias in sentencing, abolish cash bail, and reduce mass incarceration. Yet there is remarkably little evidence that risk assessment instruments help advance these goals in practice.

There are compelling legal arguments against the use of risk assessment instruments; I have written about the tools’ troubling implications for jurisprudence. On the whole, however, discourse around risk assessment instruments tends to focus on the algorithms’ technical aspects, particularly their ability (or inability) to meet benchmarks of predictive accuracy and algorithmic fairness. A vocal chorus of critics, including the the ACLU, journalists, academics, and local justice reform activists, stress that risk assessment instruments could “perpetuate the racial biases and stigmas inherent in our criminal legal system” because they base predictions on structurally racist data.

To be sure, algorithmic bias is worth taking seriously and is often reason alone to condemn the use of a particular instrument. But a key detail often neglected in discourse about risk assessment instruments is that their recommendations are advisory. Ultimately, it’s not merely the technical details of these instruments that impact the lives of defendants — it’s human decision-making. And human decision-making is malleable.

The significance of human discretion

The few studies of how risk assessment instruments are actually used have shown that judges differ widely in their adherence to recommendations and follow them inconsistently for different types of defendants. Human decision-makers can selectively follow algorithmic recommendations to the detriment of individuals already likely to be targets of discrimination. In Kentucky, for example, a pretrial risk assessment tool — intended as a progressive bail reform measure — increased racial disparities and ultimately did not increase the number of pretrial releases because judges ignored leniency recommendations for Black defendants more often than for similar white defendants. Likewise, judges using a risk assessment instrument in Virginia sentenced Black defendants more harshly than others with the same risk score.

In other contexts, human discretion can also correct for algorithmic bias. In Pennsylvania, a recent study about racial bias in an algorithm that screens for child neglect showed that call screeners minimized the algorithm’s disparity in screen-in rate between Black and white children by “making holistic risk assessments and adjusting for the algorithm’s limitations.” Virginia’s risk assessment instrument would have led to an increase in sentence length for young people had judges adhered to it; however, because judges systematically deviated from recommendations, some of the instrument’s potential harms (and benefits) were minimized.

Of course, another way that human discretion can interact with algorithms is to choose not to interact with them. Sociological work shows that algorithm aversion — the reluctance to follow algorithmic recommendations — can happen in contexts where individuals feel that their agency or power is being threatened by a new technology. This is artfully illustrated by Sarah Brayne in her ethnography of LAPD officers using PredPol and by Angèle Christin in her ethnography of prosecutors and judges using a pretrial risk assessment instrument. Police officers and legal professionals alike reported feeling threatened by how these new technologies could be used to surveil their performance and limit the role of their discretion, resulting in professional resistance to algorithmic systems.

The 15 judges I spoke with, who I interviewed with input from the Coalition to Abolish Death by Incarceration, ignored the tool’s recommendations for a different set of reasons. The most common was simply that judges found the tool to “not be particularly, um… helpful.” This is due in part to the work of activists, lawyers, and academics who, over years of public testimony hearings, successfully pressured the Pennsylvania Sentencing Commission to remove the most controversial parts of the instrument, which included directly showing judges risk scores and detailed recidivism risk distributions. The implemented version of the tool encourages judges to order “additional information,” typically a presentence investigation report, for low- and high-risk defendants, with the presumption that information contained within these reports will in turn influence a judge’s decision to assign an alternate sentence.

But none of the judges I spoke with expressed interest in changing their report-ordering behavior, and I found that the norms for ordering reports varied widely by county. In many counties, including Pennsylvania’s most populous Philadelphia and Allegheny counties, the reports contain information judges can get simply by talking to the defendant, so judges often lamented that the reports themselves were unhelpful. In other counties, presentence investigation reports already contain an additional, controversial “black-box” risk assessment; judges in those counties explained that they saw no need for an additional risk assessment instrument. Over half of the judges I spoke with also said they would have preferred to receive more meaningful information at sentencing time, such as which interventions have the best outcomes for cases involving drug use.

It was also common for judges to be unfamiliar with what the Sentence Risk Assessment Instrument did or where to find its recommendations. As one judge put it, “I never knew where that information was going to be provided for me. Was it going to come in an email? A news blog? A winter weather alert? I had no idea.” This is in part because the Pennsylvania Sentencing Commission’s information campaign was derailed by the start of the pandemic, but my findings also indicated systemic problems with how information is disseminated to judges. In a particularly revealing moment, one judge told me that they were attending a virtual training session over video call — during our interview.

The majority of judges also shared critics’ concerns about the tool’s potential for racial bias and dehumanization. One judge said they were concerned about “having a formula that takes away my ability to see the humanity of the people in front of me.” Another judge, who identified as Black, was critical of the tool’s discriminatory potential: “Who’s making the determinations? Who’s interpreting the statistics? You can say anything with statistics.” Finally, many judges felt that the tool was worse than the discretion of experienced judges — in one judge’s words, “I was elected to be a judge, not a robot.” All of these concerns, however, varied widely and were typically secondary reasons for not using the tool; even judges who were self-described “cheerleaders” for risk assessment instruments were dismissive of this particular tool.

A “useless” instrument is not the same as a harmless one

Were it to be used, however, the Sentence Risk Assessment Instrument could have harmful downstream consequences. The Commission “expressly disavows the use of the sentence risk assessment instrument to increase punishment.” However, as several judges pointed out, it is possible to infer a defendant’s risk score from the “additional information” designation given to low- and high-risk defendants, and empirical evidence from other states suggests that judges are more likely to use risk information to detain individuals longer.

Moreover, were judges to follow the tool’s recommendation to order reports for low-risk defendants, who often have minor sentences, the tool could have the unintended effect of detaining these defendants longer pre-trial — ordering a report can take 60 days. As one judge remarked, “I’m not letting them [the defendant] sit 8 more weeks in jail because some computer program said so.” Probation officers, who are typically in charge of creating presentence investigation reports, told me they feared they were going to get “a flood of cases,” but that “thankfully that has not happened” because they were already overwhelmed with cases and lacked the resources to handle such a surge.

The Sentence Risk Assessment Instrument was the locus of considerable time and taxpayer dollars; it was in development for nearly a decade following a 2010 state legislative mandate for adopting a risk assessment tool for sentencing. Despite having no impact — a finding corroborated by the Pennsylvania Sentencing Commission’s own initial data analysis — the final version of the tool satisfies this mandate, meaning that Pennsylvania can claim that some measure has been taken to improve the shocking incarceration rates and racial disparities in the state.

Some activists still see the final weakened tool as a win, viewing a harmless risk assessment instrument as better than a harmful one. In my view, however, this case illustrates the kinds of insidious algorithmic harms that rarely make headlines, adding to the growing body of empirical support for the abolition of recidivism risk assessment instruments. In practice, these algorithm-centric reforms have no significant impacts on sentencing, are resource-intensive to develop and implement, and merely pay lip service to the crisis of mass incarceration.

Grassroots organizations have been promoting low-tech policy changes for decades, including abolishing cash bail, releasing elderly populations from prison, and reinvesting money in schools and communities. Unlike risk assessment instruments, such measures actually reduce prison populations and have robust empirical support. By dispelling the AI hype around risk assessment instruments, we can help direct attention and resources toward practical changes that are less costly — and actually work.

Dasha Pruss is a PhD candidate in history and philosophy of science at the University of Pittsburgh. She draws on interdisciplinary research methods to critically interrogate the social implications of algorithmic decision-making systems in the criminal legal system.

This work was made possible with feedback and guidance from the Coalition to Abolish Death by Incarceration and support from the Horowitz Foundation for Social Policy and the University of Pittsburgh Year of Data and Society.

How Recidivism Risk Assessments Impede Justice

These tools illustrate the kinds of insidious algorithmic harms that rarely make headlines

The significance of human discretion

A “useless” instrument is not the same as a harmless one

Written by Dasha Pruss