Surveys, vigilance and the myth of neutral data

In mid December I started a series of explorations on Big Data. I hoped to touch on often overlooked issues surrounding Big Data, privacy and the protections afforded by the State as well as the political implications of Big Data for marginalized groups. I started the series with a simplified political history of Big Data. In the second part, I explored how public discussions (and policies, to a degree) around privacy have been hijacked by a specific libertarian rhetoric that blurs the distinctions between what constitutes public and private spheres by constructing the internet as a “private” domain opposed to the streets conceived as purely “public” and as such, an area where individuals should have no expectations of privacy.

In a similar blurring of what constitutes private and public spheres, individuals’ private data is regularly gathered in marginalized communities and ethnic or religious groups. These data gathering operations, usually carried out by the State in the form of mostly anonymized census or aggregate indexes are usually described as mechanisms to evaluate or create policies and normalized as tools of the policy making process. Healthcare data, education and income levels, gender disparities, religious and/ or political affiliations are recorded and utilized as justifications for measures that can even further stigmatize or marginalize these groups.

Leaks of personal data are rightfully afforded legal protections because the breach of privacy can have devastating consequences for an individual. However, communal data is treated as public domain information even when it can be used with equally devastating consequences for a community. Again, as I pointed in the first post of this series, the history of Big Data is the history of racial hierarchies and the upholding of white supremacist power structures through the use of methodically collected surveys, community indexes and data points. However, none of these uses have been discontinued. They have just been slightly re-oriented to reflect our more contemporary “sensibilities”. Real estate brokers still evaluate the racial demographics of neighborhoods to determine the value of property. Non white and/ or poor neighborhoods can see their property values plummet if they fall within the “unacceptable” percentile of certain measurements. These real estate valuations, based on data collected by the State, can even have an effect in intergenerational wealth and affect families for decades. Healthcare providers can determine cost of coverage of certain demographics based on data such as eating habits, ethnic predisposition for certain diseases and eventual health predictions. Data from census and surveys is used to allocate funds for Government programs. City Councils can regulate educational resource investments on students based on parents’ incomes and predictive models of performance. Funds can be allocated based on expectations from historical data sets.

In March of 2014, four months before Anthony Cumia’s racist and misogynist tirade, Dutch right wing politician Geert Wilders, in a gathering the night of the elections of City Councils across The Netherlands, incited his supporters to answer the question of whether they wanted “more or less Moroccans” in the country. The crowd enthusiastically engaged in a chant of “fewer, fewer”. In the weeks following the public outrage, his supporters used public statistics pertaining people of Moroccan heritage to illustrate why this group was “undesirable”. They cited poverty levels, figures about integration into the job market, school performance indicators, crime rate in neighborhoods with a higher number of people of Moroccan heritage, percentages of religious affiliation and presumed fundamentalism. There was, at the time, a general consensus that this was an acceptable use of public data. The figures, it was said, “didn’t lie”.

This collection of communal data, is generally understood as a core component of the social contract. Census and national indexes have been normalized and are usually regarded as “neutral” in terms of ideological or political orientation. The State collects data for “the greater good”; in turn, this data becomes part of the public domain available to be analyzed and scrutinized by anyone, including by companies that can use this data for profit. Conversely and to further complicate this distinction between public and private, social media can now be used for population-level research, as recently seen in a study that involved collecting public domain Twitter data for mental health monitoring. The distinction between what belongs to the individual and what is viewed as collective becomes fuzzier and cluttered.

Collected communal data is public in every way. It is public in the sense that anyone can access it but also in the sense that it is free to use. While private businesses might use this data as a tool to generate or improve their profits, the communities where the data was gathered do not get to share a portion of the financial benefits. These groups can be studied, regulated and even marginalized however, the aggregated data pertaining to their private lives does not accrue any direct financial benefits for them. When corporations such as Google or Facebook collect similarly anonymized data, it is understood by the users of these services or applications that they are doing so with the express purpose of using it for profit (for example, through sales of advertisements). In exchange, the user supposedly benefits by making use of the platform for free. However, when these corporations collect any data (even if anonymized), there seems to be a general consensus that this is part of a surveillance mechanism that can be used against individuals. As a result of these concerns, the user needs to consent to the data collection. However, there is no consent involved in the gathering and further use of communal data. Since it is gathered by the State through government offices such as the tax office, school system, census, hospitals, ministries, etc, participation of individuals is mandatory. Unlike with corporate data gatherings, there is no “opting out” from the State even if 500+ years of history of Big Data can attest to the many ways in which it can be used to further marginalize vulnerable groups.

Interested in where I am going next? in the next part of the series, we will dive into all the feelings with a review of emotional Big Data and how white fears have become valid data points.

Read part three of this series: When white fears become Big Data: racist emotions and the populists who love them.

I am an independent writer with no affiliations. If you find value in the type of work I do, please consider making a donation. Any funds, no matter how small will allow me to continue this ongoing research and analysis. Follow me on Twitter for daily updates.