A Letter to the Members of the Criminal Justice Reform Committee of Conference of the Massachusetts Legislature Regarding the Adoption of Actuarial Risk Assessment Tools in the Criminal Justice System

February 9, 2018

The following letter — signed by Harvard and MIT-based faculty, staff, and researchers Chelsea Barabas, Christopher Bavitz, Ryan Budish, Karthik Dinakar, Urs Gasser, Kira Hessekiel, Joichi Ito, Mason Kortz, Madars Virza, and Jonathan Zittrain — is directed to the committee of the Massachusetts Legislature working to reconcile House and Senate criminal justice reform bills in the Commonwealth. It follows up on, and refers to, a previous open letter about risk assessment tools, joined last fall by many of the signatories to this letter (among others).

Dear Members of the Criminal Justice Reform Committee of Conference:

We write in connection with the ongoing efforts by the Criminal Justice Reform Committee of Conference to reconcile the Massachusetts House and Senate criminal justice reform bills, which were passed by the two houses of the state legislature late last year. We write specifically with respect to the prospect of the Commonwealth’s moving toward adoption of actuarial risk assessment (“RA”) tools to inform pretrial decisions in the Commonwealth.

The undersigned write in their personal capacities. For purposes of identification, we note that signatories to this letter are Harvard- and MIT-based faculty and researchers whose work touches on issues relating to algorithms. Most are involved in a research initiative underway at the MIT Media Lab and Harvard University’s Berkman Klein Center for Internet & Society that seeks to examine ethics and governance concerns arising from the use of artificial intelligence, algorithms, and machine learning technologies.¹

As you are no doubt aware, the Senate and House bills take different approaches to the issue of adoption of RA tools:

  • Section 58E of Senate Bill 2200² mandates implementation of RA tools in the pretrial stage of criminal proceedings, subject to testing and validation “to identify and eliminate unintended economic, race, gender or other bias,” subject to a requirement that aggregate data be made available to the public.
  • Section 80A of House Bill 4043³ calls for formation of a bail commission, which would provide “an evaluation of the potential to use risk assessment factors as part of the pretrial system regarding bail decisions, including the potential to use risk assessment factors to determine when defendants should be released with or without conditions, without bail and when bail should be set.”

In November 2017, several of the undersigned previously published an open letter addressed to the Massachusetts legislature (the “Open Letter”). The text of the Open Letter is available at http://brk.mn/RAOpenLetter, and a copy is enclosed herewith. That letter was published after passage of the Senate bill but before passage of the House bill.

In short, the Open Letter highlights the complexities associated with development of RA tools; underscores the potential for disparate impact in their use and implementation; raises the need for research study prior the adoption of RA tools in the Commonwealth (and notes that the option remains open for the Commonwealth to develop its own tools rather than simply procuring existing ones); and offers specific thoughts on both technical and policy measures that might be undertaken to mitigate the risk of adverse consequences arising out of the use of such tools.

We write to reiterate the points made in the Open Letter and to highlight just some of the more recent examples of efforts that raise questions about the efficacy of RA tools and support our view that additional research and study is preferable to mandating use of such tools in the Commonwealth.

By way of example:

Gaps in Intended Use v. Actual Practice

  • In December 2017, Professor Megan Stevenson published a major empirical study of the impacts of pretrial RA implementation, using data from Kentucky.⁴ Kentucky, “often . . . held up as a leader in pretrial practices,”⁵ had used optional pretrial RA since 1976⁶ and made its use mandatory in 2011.⁷ Stevenson analyzed more than one million criminal cases between 2009 and 2016 to determine how this mandate affected pretrial outcomes.⁸ The results suggest that mandatory RA failed to live up to its promises of increased efficiency and fairness: both pretrial rearrests and failures-to-appear increased after implementation.⁹ Furthermore, even the modest improvements in pretrial release rates “eroded over time as judges returned to their previous bail-setting practices.”¹⁰ Finally, judges in rural and non-rural areas adhered to the RA recommendations differentially, exacerbating racial inequalities.¹¹ Thus, even jurisdictions esteemed for their significant experience with RA still have not demonstrated that the technology is now capable of delivering the improvements its champions promise. Kentucky’s experience points to the need to understand more deeply the way that judges’ beliefs, practices, and experiences shape the way RA tools are ultimately integrated into pretrial decision-making practices.

Issues with Accuracy and Bias

  • In January of this year, researchers from Dartmouth published findings that the popular Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) RA tool failed to show meaningful improvements over both human-only predictions and simpler algorithms.¹² Individual decision makers were only slightly less accurate than COMPAS at predicting recidivism.¹³ COMPAS’s comparative advantage almost completely disappeared when evaluated against small groups of humans predicting recidivism by majority rules.¹⁴ The research team also found that COMPAS yielded false positive and false negative rates for black defendants at roughly the same rate as humans who did not have access to the tool.¹⁵ This further suggests that complex RA algorithms do not yet offer sizeable improvements over human decision-making. Any implementation of RA should be justified in comparison to its best alternatives, both human and algorithmic.
  • In her new book, Automating Inequality, released just last month, Professor Virginia Eubanks shows that predictions drawing from social services usage data (e.g., county mental health services) result in an overrepresentation of poor subjects because wealthier individuals struggling with the same issues are often able to shield these facts from exposure to algorithmic systems.¹⁶ This raises serious concerns regarding the ability for these tools to overcome the implicit bias of incumbent systems, as many RA proponents have hoped.¹⁷

Disconnect Between Pretrial Risks and Effective Conditions

  • Some of the undersigned argue in a forthcoming article that ethical concerns surrounding the use of RAs relate not simply to bias or accuracy but, rather, to purpose. Pretrial RA is gaining traction nationwide as part of a larger effort to mitigate the harmful effects of cash bail. Yet, Barabas et al. argue that RA is ill-suited to the task of assisting judges in identifying effective conditions for release in order to protect against failure to appear and, in some places, dangerousness.¹⁸ As a result, there is a risk of simply displacing punitive effects of cash bail onto other non-monetary conditions that have no proven track record of lowering pretrial risks.¹⁹
  • Other researchers note that most risk assessments were developed on data sets that predate key risk-mitigating policies.²⁰ As such, they run the risk of nullifying good-faith efforts to lower the risk of individuals during the pretrial stage.²¹ As Lauryn Gouldin argues, the vast majority of pretrial risk assessments available today only provide one aggregate risk score, even though the risks considered at pretrial are quite distinct and call for different types of mitigating conditions.²²

The Need for Transparency and Public Accountability

  • In a recent law review article on recidivism RA tools, Professor Jessica Eaglin argues that tool designers necessarily make a number of significant and normative design choices without adequate transparency or accountability.²³ Design choices such as the training dataset,²⁴ definition of risk categories,²⁵ selection of predictive factors,²⁶ and qualitative risk categorization (e.g., labeling a defendant as “high-risk)²⁷ will affect RA outcomes. But, designers usually make these choices in the absence of adequate legal or political input and accountability.²⁸ The inadequacy of supervision and oversight is particularly troubling, because many of these design choices implicate normative judgments.²⁹ In a democracy, such value judgments are archetypally appropriate for publicly accountable actors, not private vendors.³⁰ Eaglin argues that to ensure that “society, not the tool developers, . . . decide the normative judgments embedded in [RA] tool construction,”³¹ RA tools need to be transparent (in both development and application),³² be accessible to the public for feedback,³³ and produce interpretable results.³⁴

This scholarship represents just a sample of a significant and growing body of work on the use of RA tools in the criminal justice system.

In light of the extraordinarily rapid pace of technical development with respect to the sorts of RA tools under consideration; the relatively nascent state of our understanding of such tools and the consequences of their implementation; the far-ranging impacts these tools can have once implemented; the risk that institutional inertia might make it difficult to move away from them once they are adopted; and the complex and multivariate interplay between the use of RA tools and other aspects of the criminal justice system, we submit that the appropriate approach here is not a mandate in favor of adoption. Rather, we believe that the time is ripe for study, reflection, and development of transparent processes and comprehensive best practices.

For the foregoing reasons, the undersigned advocate strongly in favor of an approach along the lines of that set forth in the House bill — research, evaluation, and establishment of a Commission. We remain open to bringing our own research efforts to bear on these complex problems and stand at the ready to help inform the Committee’s or Legislature’s deliberations of the important issues implicated by use of RA tools in the criminal justice system in the Commonwealth of Massachusetts.

Thank you for your consideration.

Respectfully submitted,³⁵

Chelsea Barabas
Research Scientist,
MIT Media Lab

Christopher Bavitz
WilmerHale Clinical Professor of Law,
Harvard Law School

Ryan Budish
Assistant Research Director, Berkman Klein Center for Internet & Society,
Harvard University

Karthik Dinakar
Research Scientist,
MIT Media Lab

Urs Gasser
Professor of Practice,
Harvard Law School

Kira Hessekiel
Project Coordinator, Berkman Klein Center for Internet & Society,
Harvard University

Joichi Ito
MIT Media Lab

Mason Kortz
Clinical Instructional Fellow, Cyberlaw Clinic
Harvard Law School

Madars Virza
Research Scientist,
MIT Media Lab

Jonathan Zittrain
George Bemis Professor of International Law,
Harvard Law School and Harvard Kennedy School
Professor of Computer Science,
Harvard School of Engineering and Applied Sciences

¹ See AI Ethics and Governance, MIT Media Lab, https://www.media.mit.edu/projects/ai-ethics-and-governance/overview/ (last visited Feb. 2, 2018); Ethics and Governance of Artificial Intelligence, Berkman Klein Ctr. For Internet & Soc’y, https://cyber.harvard.edu/research/ai (last visited Feb. 2, 2018).

² S. 2200, 190th Gen. Court (Mass. 2017), https://malegislature.gov/Bills/190/S2200.pdf (last visited Feb. 9, 2018).

³ H. 4043, 190th Gen. Court (Mass. 2017), https://malegislature.gov/Bills/190/H4043.pdf (last visited Feb. 8, 2018).

⁴ Megan Stevenson, Assessing Risk Assessment in Action (George Mason Law & Econ. Research Paper №17–36, 2017), https://ssrn.com/abstract=3016088 (last visited Feb. 9, 2018).

Id. at 29; see also id. at 4.

See id. at 30–31.

See id. at 31. Although the 2011 law required Kentucky judges to consult RA scores when making pretrial release determinations, judges retained full discretion over pretrial release determinations. See id. at 32.

See id. at 33–34.

See id. at 5, 44–46.

¹⁰ Id. at 5; see id. at 43–44.

¹¹ See id. at 48–53.

¹² Julia Dressel & Hany Farid, The Accuracy, Fairness, and Limits of Predicting Recidivism, 4 Sci. Advances eaao5580 (2018), available at http://advances.sciencemag.org/content/4/1/eaao5580 (last visited Feb. 9, 2018).

¹³ See id. at 2 (“A one-sided t test reveals that the average of the 20 median participant accuracies of 62.8% . . . is, just barely, lower than the COMPAS accuracy of 65.2% . . . .”).

¹⁴ See id. (“To determine whether there is ‘wisdom in the crowd’ . . . , participant responses were pooled within each subset using a majority rules criterion. This crowd-based approach yields a prediction accuracy of 67.0%. A one-sided t test reveals that COMPAS is not significantly better than the crowd . . . .”).

¹⁵ See id.

¹⁶ See Virginia Eubank, Automating Inequality 166 (2018).

¹⁷ See id. at 167.

¹⁸ Chelsea Barabas et al., Interventions Over Predictions: Reframing the Ethical Debate for Actuarial Risk Assessment, Proc. Machine Learning Res. (forthcoming Feb. 2018), https://ssrn.com/abstract=3091849 (last visited Feb. 9, 2018).

¹⁹ See id. (manuscript at 7).

²⁰ See John Logan Koepke & David G. Robinson, Zombie Predictions and the Future of Bail Reform (Sep. 29, 2017) (unpublished manuscript), https://ssrn.com/abstract=3041622 (last visited Feb. 9, 2018).

²¹ See id. at 36–54.

²² Lauryn P. Gouldin, Disentangling Flight Risk from Dangerousness, 2016 BYU L. Rev. 837 (2016).

²³ Jessica M. Eaglin, Constructing Recidivism Risk, 67 Emory L.J. 59 (2017), available at http://law.emory.edu/elj/_documents/volumes/67/1/eaglin.pdf (last visited Feb. 9, 2018); see also Rebecca Wexler, Life, Liberty, and Trade Secrets: Intellectual Property in the Criminal Justice System, 70 Sᴛᴀɴ. L. Rᴇᴠ. (forthcoming 2018), available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2920883 (last visited Feb. 9, 2018)(describing the use of trade secret law to shield criminal justice predictive algorithms from legal scrutiny).

²⁴ See Eaglin, supra note 23, at 72–75.

²⁵ See id. at 75–78; cf. Gouldin, supra note 22, at 867–71 (discussing RA tool categorization of various pretrial risks).

²⁶ See Eaglin, supra note 23, at 78–80.

²⁷ See id. at 85–88.

²⁸ See id. at 64, 73, 78, 88, 105.

²⁹ See id. at 88–100, 105, 108.

³⁰ See Eubank, supra note 16, at 12 (“[Automated decision-making] reframes shared social decisions about who we are and who we want to be as systems engineering problems.”).

³¹ Eaglin, supra note 22, at 104.

³² See id. at 110–16.

³³ See id. at 116–19.

³⁴ See id. at 119–21.

³⁵ The signatories thank Harvard Law School Cyberlaw Clinic spring 2018 student, Cullen O’Keefe, for their valuable contributions to this letter.