How to Create an Equitable and Unbiased AI Algorithm for Criminal Justice
Last week, we talked about the problematic nature of the United States court system. With a system that promotes mass incarceration rates, instates Draconian drug laws, and hyper criminalizes the average person, most people agree with the sentiment that our court system is dysfunctional. We then discussed how lawmakers have turned to artificial intelligence in order to fix these discrepancies in this dysfunctional justice system. But, we also talked about how these artificial intelligence tools could be exacerbating racial bias on a further level.
Studies found that an algorithm used by Kentucky to predict the ‘risk level’ of defendants was heavily skewed towards one group of people. Before the algorithm was instated, the proportions for white defendants released and black defendants released was relatively the same at 25%. However, when this algorithm was mandated across Kentucky’s courts, lawmakers found a 10% increase in white defendants released, while the rate of black defendants freed stagnated at 25%, the same rate before the algorithm was implemented.
We also talked about racial discrepancies found in a court algorithm used in Broward County, FL. According to a study done by ProPublica, ProPublica discovered that the program used in Broward County was two times more likely to label a black defendant as a future criminal, compared to white defendants. Not only that, but white defendants were also labeled more “low risk”, compared to black defendants. Studies found that the risk levels for white defendants were heavily skewed towards lower risk levels, while risk levels for black defendants were evenly spread out.
If you haven’t read that article, I highly recommend you do as it will provide a greater context and background to the issues and concepts that will be discussed today.
Here’s a link to the article.
I think most people will agree with the sentiment that artificial intelligence will play a massive role in the future of humanity. With already many industries and jobs becoming infused with artificial intelligence, AI will drastically change almost every single industry and create great economic value. However, one area that we should hesitate and carefully think about how to correctly implement AI, is in the criminal justice system here in the United States.
As we have discussed before in the previous article, for good or for worse, artificial intelligence is already imposing a rather large influence over the court system. AI algorithms, also known as “risk-based assessments” are already being implemented in states such as Kentucky and Florida. However, despite all of the optimism about AI integration into the court system, AI could still pose problems such as racial bias. In fact, heavy racial discrepancies were found in the algorithm's decision-making in both Kentucky and Florida, which is extremely startling. With the rapid normalization and integration of artificial intelligence in the court system, how do we guarantee that there is no bias affecting the AI’s decision-making that might favor/despise one group over another?
The Solution(s) and Important Factor(s)
If we want to create an AI algorithm that can effectively predict the likelihood of a defendant recidivating in the future without any bias or preference affecting the results, we have to ensure that all components and datasets of the algorithm are fair and unbiased.
In this subsection, we will be going over and explaining the factors and components of creating an AI assessment tool, such as the COMPAS dataset, the ML (machine learning) steps, and training data.
The COMPAS dataset
The Correctional Offender Management Profiling for Alternative Sanctions dataset, or also just known as the COMPAS dataset, is a risk-assessment tool that is powered by an algorithm that takes in a plethora of data to interpret. The COMPAS dataset attempts to predict the likelihood of a defendant recidivating or committing another crime in the future.
The COMPAS dataset is by far one of the most popular and widely used risk assessment datasets in the United States. This dataset is the backbone of nearly all of the risk assessment tools used in the courts of the United States.
Some of the ‘input’ data that the COMPAS dataset takes into account is the charge description of the defendant, the felony count of the defendant, and whether the defendant has recidivated in the last two years.
The data that is ‘dropped’ and not accounted for in the algorithm’s decision-making includes the name of the defendant, the case number of the defendant, the date of birth of the defendant, and other unnecessary things such as the defendant's ID.
Most algorithms are developed to utilize a logistic regression model that is intertwined with the COMPAS dataset. Studies found that these algorithms, without any modifications, were slightly more accurate than a coin flip on whether a defendant that was predicted to recidivate, actually recidivated.
ML Factors and Steps
In this subchapter, we will be discussing common yet important ML (machine learning) steps taken to ensure a more equitable and fair AI model. Keep in mind that you do not need to understand what the variables mean and such, this is just a simplified DIY version to test.
1) Splitting the data
X_train, X_test, y_train, y_test = train_test_split(X, y)
2) Creating a model
model = LogisticRegression() # You can change this to your preference :)
3) Training your model
4) Predicting on a test set
y_preds = model.predict(X_test)
5) Evaluating your model
accuracy = metrics.accuracy_score(y_test, y_preds)
Bad Training Data
Despite thoroughly analyzing the COMPAS dataset and the ML factors, there’s only one true culprit causing these evident racial biases in these algorithms: bad training data. Bad training data is data that isn’t the best or most suitable for training an algorithm. Bad training data can be data that is either vague, nondescriptive, misleading, incorrect, etc.
While bad training data can technically train an artificial intelligence program to perform its function, bad training data can’t guarantee that the program will do a good job, which is what happened with these algorithms used in these courts.
These risk assessment tools are powered by trained algorithms on historical crime data. These algorithms are trained to utilize statistical methods and processes to determine connections and discover patterns. If these algorithms are trained on historical crime data, they will try to find correlations between crime and other aspects.
For example, if an algorithm discovered that higher income was correlated with lower recidivism, the algorithm would naturally give defendants from high-income backgrounds a lower score of recidivating. This is quite problematic, as these risk assessment tools are taking in bad training data and producing biased results.
How do we fix this?
Well, the answer to that is not easy. As shown above, we looked at and analyzed the COMPAS dataset, the ML steps used to create an algorithm, and the problematic nature of bad training data. When we analyzed the COMPAS dataset, we found out that there wasn’t any bias stemming from the dataset, as the COMPAS set is just a collection of different input data taken into account. It does not generate risk levels by itself.
The ML steps used to create an algorithm are also not biased or unfair. They are just simplified steps used to create a model and test it.
This leaves us with training data. As shown above, the COMPAS dataset and ML steps do not have any racial bias in them which therefore means they can not be responsible for these racial disparities found in these algorithms. So, the only logical answer is bad training data. As explained above, the training data used in these algorithms isn’t the best suited for these algorithms.
These algorithms are usually supplied with historical crime data. As explained above, historical crime data poses a lot of problems for the algorithm as it is not always unbiased or fair. For example, if an algorithm discovered that higher income was correlated with lower recidivism, the algorithm would naturally give defendants from high-income backgrounds a lower score of recidivating.
Also, bad training data doesn’t always have to be historical crime data, it could also be data surveyed and taken in the wrong area. For example, if the training data is taken in Europe and then supplied to algorithms that are used in America, that in itself poses a massive problem. While approximately 13% (43 million) of the United States population is composed of black people, only 2.9% of Europe's population (22 million) is composed of black people. If training data is conducted in Europe, with most of the participants being white, then you could expect a fair and equitable algorithm for white people. However, if you try to apply this training data to a multicultural and multiracial country, such as the United States, it's pretty evident and obvious that these racial disparities do stem from bad training data.
One way we could fix this evident problem is by just making better training data. Companies that wish to make fair and equitable algorithm tools should not just rely on historical crime data but should strive to generate their own data. And when these companies are generating their own data, they should take in a multitude of participants from different racial backgrounds. Creating data like this would ensure that the algorithms trained from this data, would become fair and equitable and help create a more just and honest justice system.
This article is part two of a three-part series composed by me, that focuses on the implementation of artificial intelligence in the United States justice system. Next week we’ll talk about how artificial intelligence could be used to help lawyers, and maybe even substitute for them! As always, thanks for reading, and be on the lookout next week for a new article!