What is the Cost of Being Wrong?

3 min readNov 13, 2020

As the organization’s name suggests, the Human Rights Data Analysis Group (HRDAG) uses data science to answer questions about human rights on a large scale, from determining the chain of command and accountability in international cases of genocide to evaluating whether artificial intelligence-based tools used by the US criminal justice system are fair.

Because we’re data scientists, we want to do data analysis that’s elegant and brings us closer to understanding the truth about a situation, and because we’re compassionate humans, we want that analysis to be useful for effecting positive change. Data scientists want data, and this is, without doubt, the era of Big Data — we have access now to exponentially bigger datasets than ever before. As a result of this perceived abundance, we’re seeing something similar to the Gold Rush of the 1840s. Experts and industry leaders are rushing to devise ways to use the available data, and while some of it is strictly business — making difficult jobs easier, increasing revenue, and so on — some people are trying to use data to help us answer hard questions. At HRDAG, we’re very careful, and critical, about the questions. We ask ourselves constantly, “Can these data actually answer this question at hand?”

A dataset may be big, but that doesn’t mean it’s “good,” and by “good,” we mean “useful” or “appropriate.” The size of a dataset may not have any correlation to whether the data are incomplete or imperfect. With nearly 30 years of documenting and analyzing human rights violations under our belts, we are deeply informed about how unobserved events can change the conclusions drawn from existing datasets.

We’ve been involved for many years in statistical analysis that informs our thinking about the US criminal justice system. We wrote an article about homicides committed by police, estimating one-third of all Americans killed by strangers are killed by police. We’ve evaluated predictive policing tools that rely on artificial intelligence, and studied the pre-trial risk assessment tools that use existing data. Consistently, we have found that instead of cleansing the justice system of human biases, these tools perpetuate and exacerbate unfairness that’s been baked into the system — and its datasets — by decades of unjust policing practices. The datasets are only as “good” as the people and systems generating the data. So, for example, if we are trying to answer the question, “Where are the majority of drug crimes committed and by whom?”, if police officers routinely focus arrests on poor and minority neighborhoods, while ignoring the same potential arrests in more affluent neighborhoods, the data generated by the arrest records, no matter how “big,” will be biased.

This is where we find it critical to ask, “Who will bear the cost of incorrect modeling results?” As our director of research has said, “Machine learning is pretty good at finding elements out of a huge pool of non-elements… But we’ll get a lot of false positives along the way.” Ethically, we must ask ourselves, who might those false positives indict or affect?

When thinking about potential harm — and how to avoid it — we evaluate data quality, try to determine what data are missing, or unobserved, and ask ourselves if we have what we need to identify situations where analytical tools can do good. Ultimately, our goal is to supply the evidence for evidence-based policies that have the power to make the world fairer and support accountability and justice for all.

Learn more about HRDAG at hrdag.org.

Original post here.

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday.

What is the Cost of Being Wrong?

Written by ODSC - Open Data Science