Boyd and Crawford Critical questions for Big Data: a summary

Chloe Johnson
3 min readNov 16, 2017

--

Key challenging concepts/words in the core readings: Objectivity, Apophenia, Digital Divide, Analytics, Social ecology.

Definition of the term “Apophenia”: The tendency to attribute meaning between unrelated things - in particular finding patterns because enormous quantities of data can offer connections in unrelated phenomena.

This article examines many of the implications that come from the study of Big Data. It suggests that the key defining feature of Big Data is the capacity for researchers to search, aggregate and cross-reference large data sets. While they acknowledge the utility of Big Data in enabling researchers to analyse new patterns and offer new insights due to this great wealth of information, Boyd and Crawford draw attention to some of the issues and challenges we might be facing with this phenomenon, articulated in six important questions.

First, how Big Data has reinvented the definition of knowledge for researchers. Boyd and Crawford compare the emergence of Big Data to Ford’s invention of mass production. How it “reframes key questions about the constitution of knowledge, the processes of research, how we should engage with information, and the nature and the categorization of reality.” (Page 665). What this has resulted in has been a side-lining of other sources of knowledge that lack the scale of Big Data. This is a concern due to many of the flaws in Big Data collection, for instance how Social media sites such as Facebook and Twitter have a relatively poor system for archiving so Big Data analysis therefore necessarily focuses on the present, which is far easier to access than past records.

Their next point critiques the idea that Big Data is objective, where it may be very misleading. High volumes of raw information itself may not be false but as soon as a researcher seeks to understand what it means, it becomes a process of interpretation which may distort the information. Researchers using Big Data cannot be entirely objective as they must interpret the data to draw conclusions from it. Additionally large data sets drawn from the internet are often unreliable and the size of the information is irrelevant to its accuracy if it’s drawn from a non-representative population which lead to biases that researchers need to take into account in their studies.

The third and fourth points expand on this idea, suggesting that bigger data are not always better, that sheer size does not make the data more valid. Boyd and Crawford refer specifically to biases existing in analysing twitter. “Regardless of the number of tweets, it is not a representative sample as the data is skewed from the beginning.” (page 669). As such there are limits to using twitter as a data set despite the sheer quantity of data. Additionally, Data, once taken out of context loses its meaning and this is especially true for Big Data where context is hard to interpret in larger scale as generalisations must be made and Data must be reduced in order to fit into models.

The fifth point tackles a different problem with Big Data — the ethics of it. Individuals may share certain pieces of information online but that is not the same as giving researching permission to use that data, the subjects in question may often be unaware that data are being collected. The opposing argument would be that just the fact of the data being accessible should work as consent but this may still be problematic from an ethical point of view where data collection is intrusive.

Finally, there is limited access to big data. Although Big Data is perceived to offer easy access to massive amounts of data, access to Big Data is uneven. The article draws particular attention to large Social media companies and their ability to control access to Big Data, some of which will restrict access to data or sell it. As such it can be impossible for some people to analyse Big Data and conclusions drawn from it. This creates division which then undermines the effectiveness and neutrality of the research community.

Questions for the Seminar: Can we trust Big Data more than small data?

--

--

Chloe Johnson

Third Year Economics, Politics and IR student at Royal Holloway.