Data for Criminal Justice — The Good, the Bad and the Ugly.

Akshaya Parthasarathy
9 min readApr 19, 2022

Introduction

As of 2017, the United States (U.S.) criminal justice system holds nearly 2.3 million people in state, federal, juvenile, local, and country prisons (Wagner, et al. 2017). The alarming number of incarcerations is only the tip of the iceberg when it comes to the American criminal justice system. Big data and predictive algorithms possess the power to revolutionise this institution — in increasing the efficiency, bringing about a better way to determine recidivism while also trying to maintain low imprisonment rates. A popular and early example of a data-based policing model is “CompStat” (Walsh, 2001), developed by the New York City Police Department to identify “hot spots” for where crime is most likely to occur. Risk assessment tools, such as “COMPAS” (Dieterich, et al. 2016) are also being developed to analytically compute the recidivism of defendants in order to determine prison time.

However, the challenges lie in the “datafication” (Mejias, et al. 2019) of an inherently biased system. There are significant obstacles in dealing with the ramifications of bringing in an AI to do the decision making, while also ensuring that the AI itself is unbiased. Data protection plays a huge role to ensure the safety of the defendants. The predictive algorithms designed to take individualised suspicion into account must meet a quantified legal standard (Simmons, 2016), or must be combined with the police or judges’ opinions. The system must also ensure that The Fourth Amendment, which “protects against unreasonable search and seizure by governmental authorities” is maintained, preventing unwarranted or inappropriate searching. The road to overcoming the challenges may seem never-ending but isn’t an impossible feat. And if overcome, provides the promise of a healthy and transparent judicial system, free of prejudice

Digital Data Overload

Digital evidence is “information and data of value to an investigation that is stored on, received, or transmitted by an electronic device.” (National Institute of Justice [NIJ], 2008). Over the past couple of years, the ease and availability of technological devices have created a surplus of digital evidence. Within a matter of hours, it is possible to obtain a large amount of sensitive information, or collect social media data that can strongly influence a court proceeding. The trial of Bravo regarding the disappearance of his friend Christian Aguilar (Burch, 2014), the murder of Philip Welsh (Morse, 2014) are only a few examples of the hold digital evidence possesses. The prevailing challenge is to deal with the diverse and voluminous amount of digital data which originates from both networks of the suspects and victims. In a lack of a uniform process, the extraction and assessment alone would double the workload. This evidence is not restricted to just one device but can be found on services typically more than “one-hop” away, which can take a lot of time to vet through. For more high profile cases, this type of evidence can exist across borders on servers and cloud. Moreover, an increase in the number and diversity of cybercrimes has made it nearly impossible for the current systems to track the growth unless brought to attention, which in certain cases is already too late. The added obstacle here is to ensure that all data obtained thus, is without any legal issues or implications. Law enforcement must be in constant collaboration with other partners, like courts and prosecutors to determine the chain of custody, as well as what data is admissible. Nothing can be more frustrating to have evidence dismissed as unusable if it is obtained improperly.

Automation of some of the processes through data science can help alleviate the load. Social network analysis combined with forensic tools can pave the way to extract artefacts related to the case faster (Cusack and Son, 2012). Collaborating with Artificially Intelligent tools to monitor social media for finding relevant case-related proof and flag certain keywords through text mining can reduce manual labour. Such methods would be beneficial, especially for cases that have a longer period of trials, or have controversial outrages on social media. These tools can then be extended to provide early detection of cybercrimes such as human trafficking (Latonero, 2011) and child abuse on previously flagged websites.

No Room in Overcrowded Prisons

Prisons in the U.S. hold more people than they were designed to handle. If it is a numbers game, then the U.S. is taking the lead compared to any nation in the world, defeating China and Russia (ACLU, 2015). Currently, “one in 99 adults are living behind bars in the U.S., and one in 31 adults are under some form of correctional control, such as prison, jail, parole, or probation.” (ACLU, 2015). These overcrowded prisons are a result of a mixture of factors but mainly, mass incarceration of people of colour, and a “one-size-fits-all” lengthy prison sentencing. The explosive growth observed particularly through the 1990s is the result of the Crime Bill, introduced to handle violent crimes which then later encouraged a “War-on-drugs” (Judith, 2002). Tough prison sentences without taking in other factors such as involvement, seriousness, and parole behaviour have led to lives being locked up being bars for decades. The issue then is the lack of data maintenance with regards to the behaviour of the prisoners, which in turn could influence their sentence. Amidst a global pandemic, the massive incarceration rates have led to a hotspot of infections compromising the health and safety of the defendants. The “three-strikes-and-you’re-out” laws (Vitiello, 1997) implemented across the country determines prison sentence based on the number rather than the nature of the offence. Nonviolent crimes end up gathering an extreme sentence without parole if one is a habitual offender. Without a consistent form of data to rely on, several offenders end up receiving dangerously incorrect sentences, which keeps them locked up for longer than necessary, ultimately contributing to the existing overcrowding problem.

Addressing systemic communication and organisational issues should be the primary scope for data science. By providing frequent and updated information derived from databases across the country, decision-makers would be better equipped to make the call about early release if possible. Recidiviz (Kwon, et al. 2021) is one such platform that encourages data-driven decisions for recidivism and incarceration. Recidiviz took it a step further by building a model that could forecast the pandemic’s potential impact and helped reduce case numbers by reducing the prison term. Early intervention by predicting and analysing possible crime areas can help prevent arrests. Smarter paroling systems during times when a crime is forecasted to occur, such as major sporting events and campaign rallies will lead to having a quicker response time and sufficient manpower.

A number of defendants released on probation have been restricted to home confinement which in today’s digital world seems like an outdated solution. Monitoring programs that assess an individual’s behaviour and periodically reports real-time data helps keep them out of prison, and the people around them safe. Assisting the defendants outside of their prison time by linking their data automatically to potential companies for hire can ensure a smoother integration into society.

Implementation Issues: The Devil is In The Data

One spring afternoon in 2014, 18-year-old Brisha Borden’s petty theft left her classified as a high-risk criminal by a risk assessment software called COMPAS used in a Florida prison. COMPAS estimates the inclination of a defendant to re-offend based on a response to 137 survey questions. The previous summer, 41-year-old Vernon Prater, a seasoned criminal having served 5 years in prison, was classified as low-risk by the same. Two years later, Borden had not been charged with any crimes, yet Prater was sentenced to an eight-year prison term for breaking and entering as well as theft.

In 2016, ProPublica a nonprofit news organisation did an in-depth analysis of COMPAS, comparing the risk assessments for over 7000 people arrested in a Florida county. It became evident that Borden and other black defendants were subject to a racial bias within the software (Angwin, et al. 2016; Garber, 2018). COMPAS flagged black offenders almost twice as much as white offenders as higher risk, but not actually re-offend. The findings of ProPublica opened the eye of the public to significant biases in algorithms and the people that build them (Figure 1). Despite the blatant discovery, the founders and developers disputed the findings, while also protecting the proprietary rights to their algorithm. COMPAS, PredPol, OASys, (Heaven, 2020) are a few examples of predictive tools which fail to disclose any details about the algorithm, leading to uncertainty as to where the biases originate.

Figure 1: ProPublica’s findings about COMPAS.

Another obvious implementation concern is data privacy. Predictive policing tools often use a variety of factors to determine high-risk locations of where a crime is most likely to occur. Technically by law, race cannot be a factor, thus, they range from zip code, events, historical crime rates, to socio-economic and educational background. A stunning amount of sensitive information is held adjacent to inherently skewed arrest rates; the figures provided by the US Department of Justice tells one that you are more than twice as likely to be arrested if you are Black than if you are white (Heaven, 2020). The predictive algorithms provide no promise of overcoming these statistics. In fact, it could create a precarious situation if the data is compromised as now one can be arrested for a crime that they are likely to commit, simply because the algorithm said so.

An important thing to remember here is that these revelations must not be treated with vehemency when considering the implementation of data science to assist the legal system. To understand the fundamental tradeoff between bias reduction and predictive accuracy, the application of data science and the transparency of the algorithms must be increased.

Conclusion

The repercussions of a prison sentence extend well beyond the four walls of the facility. It affects an individual’s chance at being incorporated back into their community and restricts the opportunity for seeking employment or higher education. By taking away the chance of redemption, the system essentially prevents the defendants from being an additional asset to society and humankind.

Reflecting on the current challenges and implementation issues, the need of the hour is smarter, impartial and transparent data-driven systems. To tackle it from the grassroots level, there is a pressing need to bring in racial and gender diversity amongst the developers of these algorithms (Noble, 2016). Regular collaboration of lawmakers, technologists, with domain experts in public policy, historical inequalities, cultural and social areas of concern is a must to create efficient and unbiased models.

Images Used:

Figure 1 — Angwin, J., Larson, J., Mattu, S. & Kirchner, L. 2016 Machine Bias. Available at https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing [Accessed: 17th August 2017]

References:

Angwin, J., Larson, J., Mattu, S. & Kirchner, L. 2016 Machine Bias. Available at https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing [Accessed: 17th August 2017]

Burch, A. D. S. 2014, September 12. Pedro Bravo found guilty of first-degree murder of Christian Aguilar. Miami Herald [online]. Available at: http://www.miamiherald.com/news/local/community/miami-dade/article1980000.html

Cusack, B. and Son, J., 2012. Evidence examination tools for social networks.

Dieterich, W., Mendoza, C. and Brennan, T., 2016. COMPAS risk scales: Demonstrating accuracy equity and predictive parity. Northpointe Inc, 7(4).

Garber, M. 2016. Is Criminality Predictable? Should It Be?. [online] The Atlantic. Available at: https://www.theatlantic.com/technology/archive/2016/06/when-algorithms-take-the-stand/489566/ [Accessed 7 Feb. 2018].

Greene, J., 2002. Getting tough on crime: The history and political context of sentencing reform developments leading to the passage of the 1994 Crime Act. Sentencing and society: International perspectives, pp.43–64.

Kwon, J.A., Bretaña, N.A., Grant, L., Galouzis, J., Hoey, W., Blogg, J., Lloyd, A.R. and Gray, R.T., 2021. The COVID-19 Incarceration Model: a tool for corrections staff to analyze outbreaks of COVID-19. medRxiv.

Latonero, M., 2011. Human trafficking online: The role of social networking sites and online classifieds. Available at SSRN 2045851.

Mejias, U.A. and Couldry, N., 2019. Datafication. Internet Policy Review, 8(4).

Morse, D. 2014, May 6. Philip Welsh’s simple life hampers search for his killer. Washington Post [online]. Available at: http://www.washingtonpost.com/local/crime/philip-welshs-simple-life-hampers-search-for-his-killer/2014/05/05/1fd20a52-cff7-11e3-a6b1- 45c4dffb85a6_story.html

Noble, S.U., 2013. Google search: Hyper-visibility as a means of rendering black women and girls invisible. InVisible Culture, (19).Simmons, R., 2016. Quantifying criminal procedure: how to unlock the potential of big data in our criminal justice system. Mich. St. L. Rev., p.947.

Simmons, R., 2016. Quantifying criminal procedure: how to unlock the potential of big data in our criminal justice system. Mich. St. L. Rev., p.947.

Vitiello, M., 1996. Three strikes: Can we return to rationality. J. Crim. L. & Criminology, 87, p.395.

Wagner, P. and Rabuy, B., 2017. Mass incarceration: The whole pie 2017. Prison policy initiative, 119, pp.1–23.

Walsh, W.F., 2001. Compstat: An analysis of an emerging police managerial paradigm. Policing: an international journal of police strategies & management.

Will Douglas Heaven, 2020 Predictive Policing Algorithms are racist. They need to be dismantled. MIT Technology Review. Available at: https://www.technologyreview.com/2020/07/17/1005396/predictive-policing-algorithms-racist-dismantled-machine-learning-bias-criminal-justice/

--

--