Data Is Not Agnostic

An interview with Judith Simon by Sara Kjellberg, Lund University, Sweden.

--

Data by Janet McKnight licensed under CC BY 2.0

- Hi, I can’t hear you. Can you hear me?

After some fiddling with technology, Judith Simon and I can begin our conversation about Big data in Skype. Judith Simon is a researcher of philosophy and for some time now she has concentrated on trying to understand the relationship between different technologies of information, communication and computation and knowledge practices. This intersection has been a starting point for her perspective as a researcher on Big data as well. ‘The link between philosophy of technology and epistemology can be useful to explore further,’ says Judith. At the same time, problems can arise when using the Big data as a concept since it was developed in the marketing sector.

- Big data is a loaded term and marketing and industry driven; the moment you use it you reinforce the narrative you actually want to critique. There is always this risk that you buy into a discourse that you are trying to dissect. On the other hand, you need a label in order for people to know what you are focusing on.

Big data today is moving into policy documents and often in the combination with open data and making research data open. Data is frequently discussed as something that is driving economy and innovation. In the rhetoric in the European Commission, it is sometimes called “the new oil” and talked of in terms of how you can harvest it. However, Judith wants to highlight the labour intensity involved in handling data.

- Open data is an interesting topic. On the one hand, it gives new opportunities. On the other hand, who pays for making data open? Somebody must format the things for it to be open. They don’t come out of the blue; this has to be done carefully.

She continues her line of thought and says that the data curation must be included in the business models and as a part of the discussions of the financials of open data rhetoric. You have to make sure what is doable, who pays and who benefits.

- We know in the context of Big data that reidentifaction of individuals is possible; the same issues or problems of Big data is part of open data.

These things have to be taken care of in managing data. This counts both for data protection and privacy and the epistemological quality of the data. It means that data needs to be taken care of and that is a heavy job. When it comes to the handling and preparation of data in research contexts, Judith adds that we need to also think about credit and reputation in research. Normally, data cleaning and handling is not something you get a reputation for in the scientific context nowadays. She exemplifies by mentioning how she was involved in a European Union project some years ago, called Liquid Publication, carrying out studies on how scholarly knowledge is created, evaluated and maintained in the wake of the social web. Already in that project, discussions centred on authorship for different parts of a research project and whether it might be possible to allocate credit for different scientific tasks. Simon elaborates and says that here we might have to reconsider data management and preparation as one part of the meriting system so that it can create incentive for putting the time and effort into the work needed.

There are many critical issues around ethics on Big data. Simon says that many ethical issues have to some degree already been outlined, e.g. in the White House report in regards to, amongst other things, privacy, freedom of speech, freedom of being unobserved and discrimination. Now, the time has come to talk about how we can approach these issues.

- We have to think what to do about it. Do you want to intervene at data gathering? Do you want to intervene at the level of data usage? Where should regulation take place? It’s really tricky, especially if you recognize the underlying economic models the current market, where data is used as a currency. There are legal, economic and technical solutions to solve some of the issues.

What should we, as humanities researchers, do about this, I wonder. What she is trying to do is to raise concern about and awareness for the problematic issues.

- To a certain degree it’s being a watchdog.

She sees how she engages in four different areas: education, media, politics and tech design, Simon continues. She says that firstly, it is important to teach critical big data management and to embed some critical reflection upon data practices in the technical fields. Secondly, to speak publicly via media, radio, newspapers about different aspects on big data. The third angle is to engage with policy makers directly, especially for giving advice. For example: How can data practices and ethical issues and epistemic issues be assessed in research practices? Her last points are to engage as researchers in technology design and what we can do to design things differently. ‘But you always have to work together,’ she sums up. ‘The multidisciplinary approach is important and you have to enable people to talk with each other from different perspectives.’

In addition, we are in a situation where algorithmic knowledge production is advancing. These large calculations are somehow black-boxing how knowledge is developed. ‘This is not entirely new,’ Simon says, and reaches for a book on the shelf behind her. She shows me Mechanizing Proof by Donald McKenzie.

- This book is from 2001. What it analyses is historically the difference between proof as conducted by human mathematicians versus proof as conducted by computers. And it asks: What consists proof? How can we figure out if something is proven or not if we don’t understand the proof?

This is true of both the human mathematician and the computer; somebody else has to understand the reasoning behind to acknowledge the proof. And we are still dealing with the same questions. The question is if we can understand the proof when it is proven. If people don’t understand, to what extent can we then say it is knowledge?

We have to ask ourselves who knows and how can we say something about it.The more complex calculations get, who can understand them? On the other hand, we can use data analyses as heuristic, which it also have been used for before Big data. So in that way, we can use it to come up with new ideas and questions and Simon mentions again how the multidisciplinary perspective and situating the data plays a role.

- The data is not agnostic. You have to know where the data comes from and what are the limitations and how you can apply this. you have to know something about the field to apply it. It is all about meaning and context.

Lastly, Simon comments on the next steps in a humanities’ perspective on Big data. We have to strike a balance between being critical and being engaging. ‘There are some lessons learned in blind trust in numbers and the politics of numbers,’ she says. Numbers are on the one hand very useful and evidence very often comes in numbers. ‘In principal trust in numbers can often be justified,’ she continues. ‘But the problem now is the trustworthiness of the data practices. We cannot always assess them since they are black-boxed and difficult to understand.’

- We need people to make systems and practices trustworthy and for that they need different competences, but we also need people to assess if something’s trustworthy. For that we need some degree of transparency and also competences to assess trustworthiness. This is a crucial issue.

We have to have people to deal with data and data does not come out of nowhere.

About the author:
Sara Kjellberg (PhD) is a researcher in information studies at the Division of Archival studies, Information studies and Museology at the Department of Arts and Cultural Sciences at Lund University. She belongs to the research group Information practices: Communication, Culture and Society. She also works with research support services at Malmö University Library. Her research interests include information practices, scholarly communication, digital communication and development of research libraries in relation to scholarly practices. She is especially interested in scholars’ uptake of new digital communication genres or formats like blogs or other networked digital tools, but also how research data management is included in scholarly practices today.

--

--