Can We Create Ethical AI For The Criminal Justice System?

8 min readMay 30, 2024

A few weeks ago, I had the opportunity to facilitate a panel on the ethics of using AI in the criminal justice space with a number of experts from outside the prosecution space. The panel included Jumana Musa from the National Association of Criminal Defense Lawyers, Aish Shukla from the International Refugee Assistance Program, Hiwot Tesfaye from Microsoft’s Office of Responsible AI, and Ashkhen Kazaryan, a senior fellow at Stand Together. The purpose was to give the prosecutors in the audience examples from experts that have grappled with ethical AI issues in other fields and even instituted ethical guardrails around use of AI. In preparing for the panel and writing this post, I found this blog post and great YouTube series that offers digestible descriptions of AI topics. Ultimately, I came away from the panel with a cautious opportunity that AI offers the criminal justice system and a number of practices and resources that should be adopted by the field in developing and deploying any such AI tool.

With a diverse set of panelists, the intention was to start the discussion with each panelist outlining how they approached the use of AI in their field before turning to discussions about practical safeguards and addressing bias. But, from the first question regarding how each panelist thinks about the rights and responsibilities of AI creators, AI implementers, those affected by the use of AI, and legislatures, the conversation turned to the more fundamental question of: “Why adopt these technologies?” Jumana raised this point first and the other panelists mostly agreed that before trying to set ethical frameworks for AI’s use, there needs to be a discussion about (1) usefulness of the proposed tools and (2) the opportunity cost in developing and using AI tools.

Addressing the current system

Asking why an AI tool needs to be developed forces society to step back and examine the system itself. In the criminal justice context, stepping back before even developing a tool will allow for closely examining:

Whether the current policies and practices that have produced the need for the tool are desirable and
Whether the data that the tool will use reflects desired outcomes.

Both of these issues strike at a fundamental issue with using AI tools and any other machine learning/statistical model tool — the tools rely upon historical practices to then likely recreate similar practices moving forward.

For instance, creating a tool that helps to review large amounts of video evidence — such as hours and hours of body cam footage — should first raise questions regarding policing practices that produce large numbers of arrests for low-level crimes such as drug use. Furthermore, given that the criminal justice system disproportionally involves people of color and results in criminal records and jail time, a tool that enables faster review of cases furthers that trend. Rather than moving forward then with creating this tool, the suggestion is to step back and consider the possibility of changing practices that lead to the need for such a tool and instead devoting the resources to create this tool to creating other tools that would address the underlying problem — the significant number of arrests for low-level crimes that might be better addressed through other means.

From an implementation perspective, this question points to the need for practices in AI tool development to include elements of design thinking that focus on problem identification. While an office and developer may not take a design or systems thinking approach to tool development, it’s a good practice to take elements from that school of thought to ensure that there is a clearly defined problem the tool is addressing. To do so, an office, after identifying the need and opportunity for an AI tool can:

Pause development to do “empathizing” work to assess the need and its root cause.
That root cause should involve working back to the source of the need for the tool and consideration of whether creation of the tool encodes desired prior practices.
Finally, with the root cause identified, discussion of whether there are alternative tools should be developed that address that need.

This could come in the form of developing tools that, instead of addressing the proliferation of evidence of arrests, push people away from arrest in the first place, such as law enforcement assisted diversion practices. The panel helpfully provided such practical guidance for most of their ethical suggestions, including when discussing racial bias in AI and ethical frameworks.

Addressing racial bias in AI

The issue that understandably garnered the greatest concern among the panelists and the attendees was how AI tools might manifest bias and the ability to detect bias. The first point that the panelists emphasized was that there was not a way to ensure that a tool did not manifest bias. Given this, offices should only work with AI tool developers that have adopted best practices to reduce bias in model development and engage in rigorous auditing.

Without a clear statistical method to mitigate bias in models, there is significant pre-deployment work to do that includes:

Developing a logic model for how the training data has been produced to try to identify sources of bias in the training data;
Rigorous review of training data to identify correlations between variables of interest to anticipate possible bias; and
Documented use of statistical techniques like fairness classifiers to mitigate bias and demonstrate how the developer conceives of and worries about bias.

Following these steps consists of essentially performing an impact assessment of the use of the tool, and offices could follow guidelines from NIST on AI tool acquisition and use. The proliferation of open source AI models has prompted tech companies to produce free resources to guide developers in bias testing. For offices looking to use technology from developers, the next step would be for developers to document and demonstrate adherence to these practices in such a way that non-technical persons can review and trust that tool has been validated to the extent possible.

Before deployment and continuing through the lifecycle of the tool, there should be regular audits that can come in the form of outcome assessments and “red teaming.” Outcome assessments could come in the form of private-public collaborative reviews of AI systems, as contemplated in recently proposed legislation, or in companies embracing other forms of external review and testing. There are also open source packages like Fairlearn (h/t Aish) being developed that are intended to make such assessments easier and systematic. Red teaming and other forms of adversarial testing are key to not only revealing how an AI model can be turned to producing obviously malicious output, but also how innocuous uses can produce biased results that may have downstream, biased outcomes. AI developers and offices using AI should likely commit to outcome assessments and red teaming as a continual auditing technique to mitigate the risk of biased AI or biased use.

Aish Shukla also raised the prospect of post-process correction that could be a tempting manner to correct for bias. Aish described two examples where post-processing has been used:

Apple choosing to auto-correct uses of “F*ck” in texts to “Duck.” While Apple knows what the user is trying to type, the company made a value choice to “correct” the text.
To address gender differences in credit limit recommendations from automated systems.

Aish cautioned against using this approach though because correcting bias upstream in the data itself and the training process can be more extensive, while post-processing can be based on a subjective hypothesis and subjective correction that introduces unintended bias.

Creating ethical frameworks and safeguards

The panelists offered an array of options for offices to consider and take up in setting ethical standards for AI use — some more realistic than others for an office to establish. Even the unrealistic are helpful to consider though as offices should advocate for monitoring and testing regulation. For instance, while an individual office may not be able to carry out their own adversarial testing, by embracing it as a principle, offices can create momentum for companies to adhere to common standards. Some of the panelists’ suggestions were:

Creating a technology ethicist and adopter role for the office. While many government agencies, especially prosecutor offices, have limited budgets (see the ever increasing caseloads of most offices), the proliferation of new technologies warrants creating such a role. The technology ethicist can carry out needs assessments, review proposed tools to address those needs with an eye towards any ethical considerations, and help to push adoption in the office of any new technology.
Adopt office policies regarding the use of AI and machine learning technologies. As pointed out in a previous blog, there are many technologies that offices are already using that use statistical modeling, but are unaware of those models and their performance. Offices should set policies or principles for the adoption of new technology, especially those using AI and there are heaps of examples — from large tech companies to the White House to more technical guidance from technology experts.
Two safeguards that were continually raised were that (1) AI should be used to support not punish individuals and (2) that any use of AI should be transparent and understandable.

Using AI to “support not punish” in the criminal justice context is complicated as different actors in the system have different roles and different interpretations of how a tool may be supporting versus used to punish. At a minimum though, this principle would require that AI not be used to make any final, consequential decision where a consequential decision is one affecting an individual’s fundamental rights. Whether AI should be used to inform such decisions is where debate arises.

Transparency and understandability are frequently cited values with regards to AI/machine learning use. For some, this means using technologies where the underlying model itself is understandable to a human and made available to users in a transparent manner. This would rule out the use of black box models, including complex LLMs and companies that keep their models private. A looser interpretation of the principle is that the use of AI in a final product or decision is disclosed and includes an explanation of the model and the input or training data.

Conclusion

Moderating a panel of AI experts from diverse fields provided a helpful framing for additional considerations that go beyond simply asking whether an AI model addresses a specific need. Rather, adoption of AI and machine learning tools should force the criminal justice system to step back and examine whether the current system reflects our values and achieves our desired outcomes. In addition, the race to adopt these tools should push agencies to spend time researching responsible AI use, consider office values and objectives, and review currently used technologies for any inadvertent AI use.

*Thank you to Arnold Ventures and Microsoft for supporting the event.

By: Rory Pulvino, Justice Innovation Lab Director of Analytics. Admin for a Prosecutor Analytics discussion group.

For more information about Justice Innovation Lab, visit www.JusticeInnovationLab.org.

Can We Create Ethical AI For The Criminal Justice System?

Written by Justice Innovation Lab