Big Data & Policing

Ashli Dougherty
6 min readJul 24, 2022

--

How data, machine learning, and AI are transforming the future of law enforcement and solving cold cases from the past.

Introduction:

I have a very long history of being fascinated by forensic science. Before the TV dramas of Criminal Minds, CSI, and Bones I was watching countless episodes of true crime series like Forensic Files, The New Detectives, and the FBI files. In another timeline I never changed my major and I am out there working with law enforcement agencies helping solve crimes as a forensic anthropologist. But in this timeline I am a data science student who is investigating the overlap between my new career field and a long held interest.

Predictive Policing

Photo credit: Leila Meliani

Companies like PredPol claim that “crime isn’t as random as you may think. It follows a pattern.” They claim that by using an algorithm to analyze a jurisdiction’s historical records (including crime type, location, date, and time) they can assist departments when determining where they should allocate their officer’s time and resources when it comes to predicting where future crimes will occur.

One obvious ethical issue when it comes to using models like PredPol is racial discrimination. People are concerned that even though racial data is not used by the algorithm, that communities consisting of predominantly black, brown, and impoverished citizens will be overrepresented as places that police should patrol. The fact that the model uses geographical location coupled with our nation’s history of redlining and ‘broken windows’ policing make this a valid concern.

Conversely, other entities such as the George Mason University: Center for Evidence-Based crime policy is aggregating data based studies to determine what type of policing strategies are best in crime prevention. They developed an interactive matrix that details the effectiveness of each type of police intervention and is updated on a yearly basis

Example of matrix. Photo credit GMU matrix website

DNA Databases

For the last decade millions of people have voluntarily submitted their DNA to companies like Ancestry and 23&Me as a way to learn more about their family and medical history. GEDmatch was developed as a way for people to upload their DNA results in order to compare results between different DNA testing companies. GEDmatch users can opt in to allow their DNA profiles to be available to law enforcement agencies when searching for DNA matches for suspects and unidentified victims.

Recently law enforcement agencies have been using public DNA databases to develop a new tool called genetic genealogy. This tactic indirectly leads to the identity of suspects and victims by finding distant relatives and building a family tree. Probably the most well known case involving genetic genealogy led to the arrest and conviction of the Golden State Killer, Joseph DeAngelo. This one arrest closed 53 cold cases and the use of genetic genealogy (thanks to big DNA data) has the potential to close many, many more.

Parabon-NanoLabs is another company that utilizes GEDmatch’s DNA database. The company offers genetic genealogy services, but after GEDmatch changed their terms and conditions which made users have to opt into the open databases, they lost access to a significant amount of their data. So the company started to use DNA in attempts to construct digital composite sketches. According to their website they use “deep data mining and advanced machine learning algorithms in a specialized bioinformatics pipeline” to produce these snapshots with accompanying degrees of confidence for each trait (as environment can also affect physical appearance).

Departments using Parabon’s Snapshot Prediction Results are seeing movement in cases, both current and long cold. There are several examples of open and solved investigations using this technology. The lab can even age progress the sketches for cold cases to give the person a better chance of being recognized. Shown below are examples taken from volunteers and exhibited on Parabon’s website that show what their report generated in comparison to the actual person.

Photo credit Parabon-NanoLabs

There are several concerns around invasion of privacy. While I may submit to uploading my DNA profile to GEDmatch and choose to opt into having it available to law enforcement, my extended family did not. While this may seem like a small breach of privacy, according to a Science article it is “estimated that the troves could identify 60% of North Americans of European descent, even if they had never themselves taken one of these tests.” Add this to the fact that law enforcement can already gather DNA from trash and discarded objects; the concept of DNA privacy may already be gone.

Tracking of Unsolved Murders

The Murder Accountability Project was founded in 2015 with the mission of bringing attention of the hundreds of thousands of cases that have still gone unsolved since 1980. Despite the advancements in the technology that were discussed earlier, the clearance rates for homicide ratings have actually declined over the years. The website provides the public and police two FBI databases: the “Uniform Crime Report” (with data going back to 1965) and the “Supplementary Homicide Report” (with data going back to 1976).

Photo credit: The Murder Accountability Project. Graph above shows an example of Arizona’s “murder curve” showing the discrepancy between total and cleared homicides.

While the concept of an updated national crime database already exists (the FBI also runs the Violent Criminal Apprehension Program, or ViCAP), local departments may not be required to upload their records to it. For example in Texas, it was 2019 before the state legislature passed “Molly Jane’s Law”, a law making it mandatory for agencies to upload information pertaining to sexual assault cases. However, “agencies can enter other offenses that qualify for ViCAP but there is no requirement under Molly Jane’s Law to do so.” This lack of centralized data makes it difficult for jurisdictions to work together in order to link and solve homicides.

The people behind MAP claim that the data can be used to explore clusters and types of killings in order to “consider” if a serial killer is active in a given area. In 2017 retired homicide detective Eric Weitzig did just this for the city of Cleveland, Ohio. Using map data and an algorithm he concluded that 20 of 60 unsolved murders seemed to have connections and when these cases were mapped geographically they seemed to be clustered in two main “corridors”.

Source: Cleveland Police Dept. Photo credit: The Plain Dealer. Image shows the geographical location of the unsolved murders of women in Cleveland, Ohio. Weitzig noted that two clusters appeared on the map.

Weitzig noticed that of the twelve cases in the corridors the cases shared several characteristics: nine were known sex workers, nine occurred in Cleveland (versus East Cleveland), seven were discovered in vacant lots/homes, and nine were killed manually (by strangulation, blunt-force trauma, or stabbing). Detectives are hoping that by using DNA to solve the cases they can also use the patterns to close more than one. The hopes in doing this may be decreasing as the concept of the “one and done” killer arises. Because of the limited ways that homicides are described, using MAP data may create connections between murders where there are none.

_________________________________________________________________

Future plans:

While the work done at PredPol and Parabon-NanoLab are currently paving the future for law enforcement (and ideas around constitutional rights), I think that for now I will stay closer to home. I am interested in beginning some exploratory data analysis using the most recent data from MAP to see what (if any) trends can be extrapolated about Texas and specifically Dallas County.

--

--