But The Data Says So | Why Data Can’t Exist in a Social Vacuum

Modified Image by KamiPhuc

Richard Herrnstein and Charles Murray’s 1994 book The Bell Curve came under fire (and rightly so) for a number of reasons, the foremost being its attempt to link race and intelligence (represented by IQ). In fact, claims about race and intelligence have been made for centuries, and yet they always feature two critical flaws:

  1. They are almost always defended with the argument of ‘but the data says so’, and thus hide behind a veil of scientific objectivity
  2. Assuming the data is even accurate (a large assumption), it has no social function except to justify racism and bigotry.

In John Horgan’s article tackling this issue, he quotes Noam Chomsky, and I’ll do the same as it brings home that second point:

“Surely people differ in their biologically determined qualities. The world would be too horrible to contemplate if they did not. But discovery of a correlation between some of these qualities is of no scientific interest and of no social significance, except to racists, sexists and the like. Those who argue that there is a correlation between race and IQ and those who deny this claim are contributing to racism and other disorders, because what they are saying is based on the assumption that the answer to the question makes a difference; it does not, except to racists, sexists and the like.”

This quote alone should end the debate around race and intelligence, however scientific racism is still occasionally vomited out the mouth of the ‘alt-right’ (white supremacist) movement.

Nevertheless, while white supremacists continue to make the same argument, it represents an incredibly explicit and obvious instance of scientific racism. While listening to an episode of the podcast Note to Self, however, I realized that scientific racism has a younger, less hateful but still morally blind brother.

The podcast interviewed Antonio García Martínez, a former executive at Facebook that basically invented Facebook’s ad tracking system. When asked about ethnic affinity targeting (targeting advertisements based on an individual’s ethnic background), Martínez defended the practice with what was essentially the ‘but the data says so’ defense:

“So what, people who like Obama like Jay-Z, what’s wrong with Jay-Z? The data shows that in fact it is highly correlated to like Obama and Jay-Z. Targeting that way actually works. What if actually I can tease out if someone is African American or Hispanic? And what if you actually target that with an ad and they actually react or engage with it more than they would otherwise? In some sense, why are you philosophically against an ad working if the data shows that it does work

While this perspective certainly isn’t anywhere near as bad as the race and intelligence argument, it remains troubling. In response to Martínez, the host, Manoush Zomorodi, instantly cited the fact that advertisers might not show an ad for housing or a job offer to an individual because of their race, a problem that Facebook has since attempted to address.

Zomorodi’s argument boils down to the idea that data doesn’t exist in a vacuum; while it can be, and often is used for good, it can also be used for nefarious purposes, including those that can be either overtly or covertly racist. Indeed, the invocation of the ‘but the data says so’ defense signals that we haven’t considered what social function our data may have outside of the vacuum we’ve placed it in.

Many argue that our historical moment should be marked as the Age of Big Data. My concern, however, is when we start seeking out, recording, and analyzing data without any thought put to its potential social function. Sure, data about an individual’s race might increase advertising revenue, but is that a trade-off we want to make if it can be used to perpetuate and deepen racial divides?

Of course, Facebook isn’t the only project that is grappling with the social implications of data. Machine learning algorithms are constantly learning our racist, sexist, and otherwise bigoted ways. Tech Crunch notes that Google’s machine learning algorithm that returns what it thinks are closely related words is entirely sexist when it links ‘father’ to ‘doctor’ and ‘mother’ to ‘nurse’, or ‘man’ to ‘computer programmer’ and ‘woman’ to ‘homemaker’. The reason for this is that machine learning algorithms learn from the data they are fed. Yet, despite the fact that these associations are literally what the data tells us, they are clearly socially unacceptable, and Google has recognized this.

This isn’t a case against data or science. Rather, it is a reminder that we need to recognize the social impact seemingly ‘objective’ or ‘neutral’ data may have. After all, data is collected by people, is interpreted by people, is often about people, and can be reinterpreted by other people. It cannot exist in a social vacuum.

This article was originally posted on my own site, TheTinHat.com.

To support me writing more content, consider making a small pledge on Patreon. Even just $1 helps!