Artificial Intelligence In Criminal Court Won’t Be Precogs

Justice Innovation Lab
21 min readOct 31, 2023

--

Photo by Jonathan Kemper on Unsplash

Artificial intelligence is everywhere — including in our criminal justice system. The general discomfort around the expansion of AI is particularly pronounced when it comes to our courts. The human element has been at the very center of our justice system since its inception. Guilt is determined by a jury of your peers. Punishment is decided by a judge. Whether you are fit for an alternative to incarceration — diversion, treatment, or program — is also decided by people, usually a combination of judge, prosecutor and defense attorney. The notion of AI replacing any or all of the humanity in those processes is frankly scary. A justice system run by algorithms feels dystopian.

Yet, AI and statistical modeling in general is already present and will likely grow in use in the criminal justice system. For example, machine learning is already used to predict where crime may occur and by whom. Models used for pre-trial detention and predictive policing have garnered significant controversy regarding possible bias and pushed the field to be more aware of how algorithms reflect the underlying data they are trained on and the possible risks from malicious data manipulation. Despite the risks, the criminal justice system’s experience with algorithms along with the stress on the system from large caseloads has primed a rush to use recent advances in AI in a myriad of new ways. This includes the growth of seemingly innocuous generative AI to draft legal documents or provide summaries of long and numerous documents. AI is likely to save staff significant time in reviewing evidence and drafting documents while decision assisting AI will be used to prioritize and categorize cases.¹ However, without proper oversight, even the innocuous systems may perpetuate or worsen biases in the criminal justice system.

Based on my experience bringing data-informed decision-making into prosecutor offices, I’m writing this article to inform the general public where statistical modeling, machine learning, and AI are already present in the criminal justice system, where I expect them to grow, and finally to offer some thoughts on practical guardrails. Throughout the article I discuss various types of statistical modeling — from the more restrictive regression based models through the more flexible AI such as ChatGPT. For the purposes of clarity, I refer to more restrictive modeling as algorithms and the more expressive models as AI. I am using these terms in part because current legislative efforts appear to distinguish statistical models in this way. I discuss both in this article because uses of both are on the rise in the criminal justice space and because proposed federal legislation likely only covers the algorithms, despite the fact that both are changing the practice of criminal law.

Where Algorithms And Artificial Intelligence Is Currently Used

Algorithms and statistical modeling have been a part of the criminal justice system for over a decade. The use of AI in criminal justice, like in many fields, is a relatively new phenomenon. In evaluating where AI is currently used and will likely be used, I think about it from the perspective of the outputs of specific decision points within the system from policing to court to appeals. Some of the uses of algorithms and AI at different decision points use the same or similar technology, but I think it is useful to explicitly outline these examples for non-technical readers to demonstrate as many applications of technology.

Policing

  • Hot Spot and predictive policing — Companies like PredPol and Palantir have been developing predictive crime models for years. Hot spot predictive policing are geographic based models that attempt to predict where crime is likely to occur. Other predictive policing models are person based and provide police with lists of people presumably likely to be involved in crime in the future.² Both model types are cautionary tales in using algorithms and AI in the criminal justice system and there has been a lot written about issues with these tools.
  • Pattern identification — Algorithms and AI can identify common patterns to crimes, without needing an officer or investigator to sift through volumes of incident reports. Algorithms and AI can do this more quickly and comprehensively than a human. More and more police software vendors are incorporating such tools into their platforms as the technology is well-established. One concern might be where officers begin using generative AI to write police reports such that pattern recognition software begins picking up on false patterns that are the result of more reports being produced by the same source. The result of this would be the pattern recognition software picking up on the pattern of AI generated reports rather than on a true signal of similar crimes being identified because of common elements in police reports.
  • Facial recognition — A similarly, well-covered topic, facial recognition technology is growing in use, but comes with many issues. The serious accuracy issues with these models point to the need for accuracy standards and an understanding of optimization and error rates among users. Furthermore, these models also raise serious questions over biometric data privacy.

Pretrial and Case Review

  • Pretrial detention — Algorithms are already involved in pretrial detention decisions through assessing an individual’s risk for various outcomes — committing a crime upon release, failing to appear at a future hearing, etc. This article does a good job summarizing the use of these algorithms and associated risks. Up to this point, these algorithms have generally followed a pattern of developing an assessment using advanced statistical modeling then reducing the algorithm to something that a human can implement or easily review.
    For example, many assessments are reduced to checklists whereby arrestees are assigned points for various things e.g. their number of prior offenses, convictions for violent crimes, time since last arrest, etc. Based on this point system, a pretrial detention recommendation is provided. In some instances the inputs for the checklist are digital and provided automatically such that a human only sees the final recommendation. In other cases a human is provided the inputs to evaluate the checklist themselves and reach the final recommendation. These human in the loop processes are transparent to human actors and are more readily checked for accuracy, including issues such as mistaken identities.
    Such systems also generally have a release valve, allowing judges to override the “decision” of the algorithm. As society becomes more comfortable with AI recommendation systems, one can imagine that the second part of this process — whereby algorithms are reduced such that humans can easily review the inputs — might slowly disappear and be replaced by output scores that humans simply rubberstamp.

Appeals

  • Reviewing convictions — Generative AI is already being used for one of the most labor intensive activities in the criminal justice system — doc review for incarcerated individuals³ seeking appeals. The California Innocence Project has begun using a ChatGpt-based program to review applications for representation for specific legal issues that would make a case a good candidate for appeal and representation. Given the rise in conviction integrity units within prosecutor offices, these groups could use AI to greatly speed up the initial work of reviewing hundreds to thousands of pages of documents when trying to identify cases for review.

Where Algorithms And Artificial Intelligence Is Headed In the Criminal Justice System

Some of the scenarios described below may already be in use as agencies don’t always tout their adoption of algorithms and AI in this space. All of these use cases are based on my personal experience working in the criminal justice system with technology and build off trends I’ve seen and my understanding of algorithms and AI technologies. Including technology in this list is not meant as endorsing its use. For most of these technologies there are data privacy concerns and other issues that I have not addressed here as well. There are other applications of algorithms and AI that I have not included because I think their adoption is unlikely or too far afield from what I am focused on for this article. Finally, there are likely innumerable applications of algorithms and AI that I have not thought of and welcome comments as to where I’ve missed the obvious.

Policing

  • Untangling complex financial crimes —Complex financial crimes such as the use of shell companies to hide fraud or money laundering for larger criminal enterprises are some of the most difficult to investigate and prosecute. Smaller, local financial crimes such as wage theft are also difficult for local police departments to investigate and address. The investigations rely upon the analysis of a large number of diverse data sources — state databases of company ownership, tax documents, financial records, etc. Generally, only the federal government has the necessary resources to investigate such crimes as the law enforcement agencies with enough time to write warrants and track down data and forensic accountants capable of untangling complex financial arrangements.
    In addition to the challenge of simply determining if there is a crime is getting the data in a state to do the analysis. AI systems capable of even just automatically reformatting data and especially those capable of reviewing documents to identify links between businesses and consistency across financial reporting will significantly lighten the load of analysts and should allow for more investigations. These tools remove significant barriers for local police departments to investigate these crimes, though most departments would likely still need to reallocate officer priorities.
    Likely, the true barrier to using AI for this purpose will be barriers put up that prevent law enforcement from combining data sources or blocking regulations requiring the standardization of data. Similar efforts prevent law enforcement from fully utilizing records of gun sales to investigate gun crimes.
  • Automating paperwork and notifications — Unsurprisingly, helping officers to complete the most tedious parts of their job, writing up incident reports or sending out notifications to various parties, will be a likely use of AI. To illustrate, I once spoke with a number of officers who joked about a more junior officer for making an arrest of a known “fence” because of the volume of paperwork the officer then needed to do — namely categorizing all the stolen goods and attempting to find the owners. AI will be able to take in that information, make possible links between recovered goods and stolen good reports and possibly send automated notifications to victims. Such automation sounds great and may better align police to working on harder, more strategic cases by lessening the burden after the arrest. There are obvious risks to this use as well. Automating report writing may allow for biased officers to slip under the radar because their reports no longer reflect their bias, rather the report is written through the veneer of standardized language that obscures intent. Automated notifications require humans to take care in who is assigned to receive information and have a tendency to go to the wrong people on accident. These issues are issues with the current system, but leaving them to AI may exacerbate or mask them.

Pretrial and case review

  • Initial reviews of evidence — Generally, the police collect evidence of a criminal incident and present that evidence to the prosecutor to decide whether to indict/charge a person with crime. For most cases, this decision will consist of a police report, possibly some victim and witness statements, and perhaps body camera or other video footage. While offices have various levels of review before indictment, AI, already being used for some document review, could review this evidence for discrete, specific issues — mistaken identity, inconsistencies in statements, etc. Based on this review, prosecutors could more quickly assess the strengths of a case and whether a quick dismissal is warranted.
    Key to this will be providing a system like ChatGpt with sufficient context and reducing the prompt such that the system has limited moral reasoning to do. This should limit the impact on substantive rights e.g. not asking whether to file a case based on the submitted evidence. Such questions are essentially asking AI to determine guilt, or at least a percent likelihood that the person is guilty and are better left to experienced attorneys (and humans). Furthermore, limiting the options and implications of the output makes it easier to review decisions for accuracy, and removes some issues with unforeseen bias affecting the decision.
  • Targeting review of evidence — Cases with video evidence or a large number of documents can take a long time to review. AI can assist this review by guiding individuals to the documents or parts of a video most relevant to a case. For instance, given a multi-hour video of a garage break-in, AI can limit the video to just the time periods where there is a human in the frame.
  • Pretrial diversion — These programs are intended to take people out of the system at as early a stage as possible — often before being indicted. Program eligibility requirements vary, but an AI system with sufficient information could likely assess eligibility in most cases.

Court filings

  • Sanctioned attorneys aside, more and more lawyers will begin using ChatGpt and generative AI to conduct legal research and generate at least drafts of court filings. All lawyer offices, including prosecutors and defense counsel, already share template filings and research memos such that some amount of many court filings is copy-paste from elsewhere. The clear risk with using AI for drafting is exactly what happened in the sanctioned lawyer case — the AI ‘hallucinated’ (made up) cases as legal precedent. This is less of a risk in the criminal context because the universe of cited cases is much smaller and well-known within a jurisdiction thus attorneys should be able to easily review citations or the model limited to a small bank of precedent. Furthermore, platforms like CaseText are aware of the issue and presumably are working to ensure the models are trained on and can be limited to applicable, accurate jurisdiction law.
    While legal tech companies might be able to eliminate hallucinations, there still might be issues with generative AI and legal reasoning or racial bias. In the case of legal reasoning, given that there are already models that are passing law school exams and the bar, it seems likely that lawyers will use them to draft legal reasoning sections. Eliminating racial bias is a different manner, especially as it may arise in less obvious places. For instance, provided with near-identical evidence, but with arrestees of different races, will generative AI draft a similar statement of facts or will racial bias lead to pernicious differences in language or emphasis? Unfortunately, such differences in how AI drafts such sections may not be noticeable to attorneys using it on a case by case basis. Rather, identifying such an issue would require systematic statistical review of these systems.
    Some may view this “upstream” bias as negated and murky because of the wealth of additional influences between something like a brief and a judge’s final decision. Within the legal process there will be additional evidence presented to the judge and live hearings that affect decisions and cloud the influence of systematically biased filings. Despite this, it should be of concern because many believe this upstream bias already affects how we view the world. For instance, mass media narratives linking race and crime create associations for many between race and crime. Those associations very likely affect how people then interpret new events. If this is true, then judges exposed to systematically biased filings are being told repeatedly that one group of defendants is somehow different from the other group, shaping their view of both groups over time. This isn’t a new problem, it’s already happening, but using a tool like text generative AI for writing filings risks encoding the issue into the criminal justice system.
    One possible solution would be to remove indicators of race in evidence used by the generative AI or in the source data used to train the AI system. By eliminating race from either the source data or from the content the generative AI draws on, the system might not build associations between race and verbiage or sentiment. Similarly, were race removed from evidence supplied the model could not draw upon that information were bias already baked into the model. This may not be full proof as associations within the model may stretch beyond actual use of racial terms.
    A less extreme and safer use of tools like ChatGpt is as a document formatting and citation machine. The legal world spends many hours formatting documents and cite checking — i.e. ensuring that citations to other cases, the law, or other documents properly represent the original. Much of this work could be put to these tools and eliminate silly frustrations like getting margins lined up correctly or ensuring that a quotation is correct.

Plea negotiations

  • Across and within jurisdictions plea negotiation processes can vary greatly. For some, negotiations happen via email or contact outside the courtroom between prosecutors and defense. In other instances, all negotiation is done in the courtroom. In addition, some offices have standardized plea offers that are generally a starting point for what prosecutors offer. Like reviews of evidence discussed above, AI systems might use similar evidence to construct initial plea offers and even communicate them to defense, shortening the negotiation process.
    Similar to potential issues in the drafting of the statement of facts, there are concerns that such a system, without proper oversight, might promote bias in plea offers. For instance, it might offer less good terms to people of color. This concern might be mitigated if the AI is simply choosing the most appropriate initial offer from a preset list of standard offers. Furthermore, this system can significantly benefit defense and the criminal justice system as a whole by making plea offers more transparent and by speeding up the process such that cases are resolved more quickly rather than drawing out cases such that defendants are required to make continual, often pointless, court appearances.
  • Alternative / Speciality Courts — Drug, mental health, and veterans courts are all speciality courts that defendants must be referred to and accept that referral. In most jurisdictions there are challenges with eligibility criteria for the courts and defendants may refuse offers to these courts for various reasons — the most frequent I’ve heard are because of requirements to plead guilty and the onerous conditions of programs relative to the punishment in traditional court. Regardless, AI should be able to better assess eligibility and extend an offer given clear criteria. For instance, AI can more quickly look through a defendant’s criminal history and determine if there are disqualifying incidents. Offices might also be tempted to ask AI to assess whether a defendant is likely to accept an offer or succeed within a program. Such questions are clearly more subjective and are at risk of being affected by bias, just as they are with current human decision makers. Within speciality courts, AI might even be used to tailor programming to a defendant — though this seems unlikely given the strictness of current program requirements.

Hearings

  • During a hearing judges make many legal determinations regarding the admission of evidence and other legal rules. These rules are well-documented and sometimes challenging to apply. One possible application of AI would be to provide guidance to judges, especially as some judges are not well versed in courtroom rules, on application of the rules in a specific context. Again, given that AI systems are already passing the bar, it’s possible that provided with sufficient context — transcripts of what was said and the evidence on hand — such systems might be able to make decisions like whether to exclude the evidence or not as hearsay.

Case resolution and sentencing

  • Bench trials — Most cases where the defendant is found guilty is resolved through plea deals. Outside of pleas, about 7% or less of criminal cases go to trial where defendants are guaranteed a jury trial. But in some circumstances there may be a bench trial where there is no jury. In such circumstances someone may propose to use AI in assessing the guilt of individuals, perhaps even as a check against a judge’s decision. This seems unlikely given that use of AI in this way would undermine judges’ decision making and affect defendants’ substantive rights.
  • Sentencing — Similar to pretrial detention decisions, algorithms and AI can provide a recommended sentence for a defendant. Algorithms for crafting recommended sentences already exist, but I am not aware of any jurisdiction where they are used by judges in sentencing. Where they are used, recommendations could be constrained by sentencing parameters or sentencing grids. If generative AI is used, then the system might also generate its “reasoning” for its recommendation — something not even currently required of judges. There is a history of removing judge discretion in sentencing such that using AI to provide recommendations to judges or notifying judges as to how similar defendants were sentenced seems plausible.

Appeals

  • Filings — AI will likely be able to do basic legal research and writing and thus could help defendants file appeals and other legal documents. This is especially helpful since filing an appeal often consists of navigating cumbersome processes and forms that AI will likely be able to help with.

Algorithms and AI will likely continue to be used to assess defendant risk (rather than culpability); help lawyers and others to more easily format and write legal documents; and reviewing and summarizing voluminous legal records. There are other possible uses that have more direct and immediate consequences, such as assessing eligibility for certain programs, that should be cautiously implemented. Perhaps the adoption of AI into any of these decision points will also force more review of the people and the legal and policing systems that put them in those positions to start with.

Guidelines For Algorithms and Artificial Intelligence In The Criminal Justice System

Regulating algorithms and AI is already a serious topic and luckily places like the National Institute of Standards and Technology (NIST) provide recommended standards for use of AI. Furthermore, local governments are starting to roll out their own guidelines and policies for algorithms and AI use and members of Congress are working on proposed legislation. Beyond following such standards, criminal justice agencies should carefully consider how algorithm and AI adoption in any use case affects actor decision making and resources and roll out use in a testable framework. For instance, where algorithm and AI technologies save significant time for attorneys and staff, where should those actors shift their attention to improve public safety and reduce harm? Below are some practical, guiding principles criminal justice agencies should consider when adopting algorithm and AI technologies:

  1. Bar algorithms and AI systems from providing decisions or making recommendations adverse to an arrestee or a defendant. Given that we know models are subject to racial and other bias, limiting their application such that they cannot be used to inadvertently harm individuals would be a best practice. This would mean that recommendations for things such as pre-trial detention would only be to provide a recommendation for release, whereas if the model did not recommend release, that should not be interpreted as a recommendation for detention. Furthermore, models should not be designed to make recommendations for detention because such a recommendation is adverse to an arrestee.
    With regards to uses of generative AI for things such as legal writing that might exhibit racial bias different rules will be required as these “upstream” processes are more likely to have substantive human input.
  2. Require analysis of gaps in the data that might impact algorithm and AI model outputs. This analysis will help to assess the relative strengths of models and guide review of their impact. For instance, where AI might be used by conviction integrity units to identify cases that should be reviewed, such prioritization modeling, though not adverse to any one defendant, might still be biased in benefiting one type of defendant over others. This differential benefit might happen because of gaps in the data for particular types of defendants amounting to omitted variable bias. For example, there is less evidence generated and needed to convict someone of a drug offense than a murder and drug charges are less likely to be challenged for wrongful conviction. As such, there are systematic differences in the data based on the nature of the charges which can lead to differences in the probability of a model picking out cases for review based on the underlying charge.
    While agencies are unlikely to be able to conduct this analysis themselves, there are many university and nonprofit researchers that are interested in this work and would be willing to partner with agencies for this purpose. Furthermore, companies providing the technology can and should provide reporting features that would allow agencies to quickly assess things like rates of “success” in flagging cases for review by the charges at issue or the defendants’ race. Such reports, while not necessarily definitive evidence of gaps in the data used to train the models, would be good early indicators for users of potential issues.
  3. Require regular outcome review reports that draw upon publicly available data for standardized testing. Similar to the gap analysis suggested above and because regular review of underlying algorithms and training data is likely unrealistic for government, instead agencies or the federal government should create standardized datasets to test bias in tools. For instance, an agency might create a sample dataset of 100 defendants with similar characteristics and pass that dataset through an algorithmic or AI tool to determine if there is problematic bias. Tool providers can also create simple features that would allow for this testing e.g. systematic uploading and collection of results to facilitate the review.
    By using a publicly available dataset, agencies could compare performance across tools, are more transparent for public review, and can easily adapt or add to the dataset. Use of a public dataset also permits third party researchers to review and critique the dataset for any issues and create public, open source reviews of common models using a common dataset. Regular review is preferred over relying upon creating a right to legal action as using the courts to review outcomes is very costly as those affected would need to be aware of how they were affected, request the data used by the tool, and review the output from the tool version they were subjected to. Furthermore, judges are not necessarily well-suited for understanding and reviewing algorithm and AI models and outcomes, especially when there may not be an established dataset for competing experts to test.
  4. Ensure that where algorithms or AI are provided data — police incident reports, video, photos, witness statements, etc. — to make a prediction, that all of that information is retained for human review. Given that the more complex algorithms and AI models may not be human reviewable, we should retain the data that was provided for later human review. If we think of algorithms and AI as mimicking human decision making, then just as we retain evidence to review judicial decision making, we should set up systems of review for algorithm and AI decision making. While evidence in criminal cases is generally retained, since this evidence will be in digital formats agencies using AI should adopt digital evidence storage systems. Furthermore, agencies should have clear standards for logging what evidence was provided to an algorithm or AI tool.
  5. Only work with tool providers willing to provide a sample of their training data in the format used to train the model. A source of frustration for anyone reviewing others’ work is when data is provided in a format that forces the reviewer to spend substantial time standardizing the data to then assess a model. Not only is this process costly, it often involves making subjective choices that affect the final dataset used for analysis. Given this, model developers should be asked to provide the data in the format as used in the modeling process and possibly also required to share the code or at least the substantive decisions made in processing the data. By only working with third parties willing to commit to this, agencies can ensure that substantive review of models by others is possible.
  6. Regulators and legislatures should create and promote standards of fairness in algorithmic outputs, including for differences between races and genders. Without this guidance, agencies using algorithm or AI tools will struggle to evaluate whether a tool is biased or not. Fairness and racial bias can be measured in various ways such that a model might be considered “fair” or “unbiased” according to some measures, but not others. Furthermore, constraining algorithms for fairness can have unintended consequences. Given these challenges, without guidance as to what constitutes fair or unbiased, judges and juries will likely be left to decide these standards. This may be desirable if different measures of bias are made into cognizable legal claims that prevent courts from shutting out claims based on aggregate, statistical evidence of bias.
  7. Require that tool providers reveal what the algorithm is optimized for and a description or assessment of the different types of errors and the rate of those errors in the algorithm’s predictions. Algorithms are designed to answer a specific question and are given parameters of what constitutes success. This information should be made available to users and regulators with documentation of how the model creator arrived at that definition of success. Furthermore, algorithms, like humans, get things wrong and can get things wrong in different ways e.g. an algorithm might predict a mole is cancer when it is not, or that a mole is not cancer when it is. The rate of various errors the model creator considered and the real world implications of those errors should be clearly stated by the algorithm creator.
  8. Be cognizant and concerned about the use of AI tools upstream of any final decision making. As discussed above, generative AI tools are going to be used more and more in the legal context, yet the bias from these tools may be ignored because it happens before any final decision is made and it is unclear how that bias even affects the ultimate decision maker. These tools can still be tested as described above but may require additional testing tools. For instance, where ChatGPT is used to produce legal briefs an agency, using a standardized dataset, can generate a set of testable briefs that then can be analyzed using sentiment analysis. Of course, the sentiment analysis tool requires its own standards of fairness and bias.
  9. Ensure that the algorithm or AI tool provider properly handles any data. Criminal justice agencies are subject to CJIS standards for data privacy as well as other local standards. CJIS standards may not apply in many circumstances given that an agency’s data is its own to work with and few states closely monitor criminal justice agency data usage. As such, agencies will need to work with algorithm or AI tool providers to ensure that they meet any legal standards and should go beyond this to ensure that their data is not used by the algorithm or AI tool provider beyond the agency’s intended purpose.

Criminal justice agencies need to begin considering guidelines like these and adopting practices immediately as prosecutors, defense attorneys, and others are likely using these tools already. This is especially true as these agencies are often independent of other local government agencies and thus are responsible for their own standards and practices. While most agencies are aware of issues with algorithm and AI tools that provide final outcome recommendations, few are being forward thinking regarding the outburst in generative AI use. Furthermore, despite being aware of issues with AI, few criminal justice agencies have adopted practices to mitigate concerns of bias. In order to avoid systematizing bias veiled behind AI agencies should start to act now.

[1] See the July 21, 2023 episode of Marketplace Tech for a discussion of AI in the legal realm.

[2] The Strategic Subject List (SSL) in Chicago was probably the most famous of these models and has been ended. Historical data on the program is actually available here.

[3] See the June 23, 2023 episode of Marketplace starting at the 22 minute mark for a discussion of the use of AI and chatbots to review legal documents for California’s innocence project.

By: Rory Pulvino, Justice Innovation Lab Director of Analytics. Admin for a Prosecutor Analytics discussion group.

For more information about Justice Innovation Lab, visit www.JusticeInnovationLab.org.

--

--

Justice Innovation Lab

Justice Innovation Lab builds data-informed, community-rooted solutions for a more equitable, effective, and fair justice system.