More Data, More Problems

A review of Cathy O’Neil’s “Weapons of Math Destruction”

“Big Data processes codify the past. They do not invent the future … Only when we have an ecosystem with positive feedback loops can we expect to improve teaching using data. Until then it’s just punitive…” — Pg. 209–14

In a world becoming increasingly reliant on Big Data, Cathy O’Neil’s Weapons of Math Destruction is a refreshing, and much needed, take on the risks associated with relying on data driven models for decision making. Cathy outlines her own journey in data science during the financial meltdown of 2008 and shares anecdotes of several places where the application of algorithmic decision making has had unintended consequences. These anecdotes alone make Weapons of Math Destruction a worthwhile read for anyone, whether they’re interested in data science or not, since these models have become omnipresent in our lives. From credit scores to performance evaluations, society is becoming increasingly dependent on finding quantitative metrics to define success or how to achieve it. Cathy’s response isn’t to stop using these models altogether, but rather to make them transparent so that we understand what variables drive them. Algorithmic decision making is here to stay, but our romanticization of it should end.

In an ideal world, an algorithm might make decisions that are free of human prejudice. In the real world, humans are creating these algorithms and, in some cases, passing along their prejudice(s) to them. The models used by law enforcement are a great example of this. Cathy cites the shortcomings of a crime prediction model that Chicago PD used in 2009 that attempted to model the spread of crime like an epidemic. Its dataset targeted geographical hotspots of crimes and assessed the social network relations of people living in those areas. Their theory was that the likelihood of someone committing a crime correlated with how much crime was committed by the people that they networked with. To prevent crime from before it even occurred, Chicago PD went as far as creating a list of people that they thought would commit a crime. This led to the Chicago PD surveilling and intimidating innocent people such as Robert Daniel who was told by a police officer that they had their eyes on him. Is that really fair though? Should you be put under extra scrutiny for the actions of those around you? Unfortunately, tactics like these are prevalent in many places other than Chicago. And it’s not just because of our switch to data driven predictive models. Law enforcement in America has a history of targeting civilians based on external characteristics rather than actual criminal offense. A prime example is NYPD’s Stop and Frisk policies. Creating predictive algorithms before critically examining the flaws in our current law enforcement policies will just codify those flaws into the algorithms.

So what can we do?

Cathy doesn’t suggest that a reexamination of our data driven predictive and decision making models will reinstate fairness into the world, but she does argue that it can improve the efficacy of those models. Her primary suggestion throughout the book is to reduce the opacity that surrounds these algorithms. The problem we have is that the mystique of these algorithms makes it difficult for people to thoroughly test and criticize them. Cathy argues that many of these models lack accountability even though they have the power to affect many lives. Unfortunately, the proprietary nature of these algorithms creates a disincentive for the creators of the algorithms to make them more transparent. It’s ironic that the market pressures which have led to the proliferation of data driven models to optimize life are a subset of the same conditions that prevent these models from becoming optimized.

Cleverly named, Weapons of Math Destruction exposes some of the poorly executed attempts to introduce algorithmic decision making into various societal processes. Cathy’s own experience in the economic meltdown of 2008 makes this a unique perspective from a data scientist who was once complicit in the criticism that she has outlined.

The truth is that data science is sexy right now. Every company, organization, and institution wants to be optimized and efficient. And the mystique of proprietary algorithms is causing people to turn a blind eye to how these algorithms affect the people that are just a part of the dataset for these models.

Ultimately, it is up to the companies creating these models to become accountable for them. And it is up to consumers to crowd out the models that are less transparent and pose greater risks to society. Cathy does a great job in articulating the need for greater oversight of the mathematical models that are become more and more prevalent in our lives. By no means should you take this book to be a criticism of data science as a whole. Instead, allow it to shed light on the ethical grey areas that exist in the world of Big Data and the urgency that exists to create a better understanding of it.

This book is recommended for anyone interested in the impact of data science on society. And I mean all of society. Cathy provides anecdotes and critical examinations of the algorithms that drive finance, education, criminal justice, politics, marketing, employment, insurance, and credit. This book is a valuable tool for understanding the mechanics of decision calculus in multiple facets of our lives and how flawed they might be.

There are a few ongoing projects taking place to create a framework of accountability and ethics regarding Big Data. One such project that Cathy mentions is Princeton University’s Web Transparency and Accountability Project. Similar projects include The Information Accountability Foundation and the Council for Big Data, Ethics, and Society.