Research data is a product of (its) time

Jutta Haider, Lund University, Sweden.

Research data are complicated things. The question of whether they are things at all is of course contentious. Yet, even if we leave this issue aside for now, there are continuous debates - some would say struggles - over who owns the data, how to handle and describe it, how to store and share more of it and why, in short, who has which interest in it. It is also unclear when it is appropriate to speak of research data in the first place, as opposed to say, publication, document, working material, metadata, catalogue entry and so on.

Kartenlocher licensed under CC0 Public Domain

For instance, what is a publication for one researcher could be data for another and what is considered meta-data in a research project might be data in the library collection. This is particularly relevant for research libraries and similar institutions that are increasingly involved in providing data services to researchers, often in response to funder demands and policy frameworks, but also in order to meet researchers’ requirements. These institutions are called upon to secure long-term preservation and access to research data for future uses whose forms are unknown and to enable future research questions for problems, which have not yet been formulated.

To do this, their operations are projected onto an idealised, uni-linear research process that is imagined to have a beginning and - increasingly - no end, where data is created, stored, used and deposited in order to be used, re-created, stored and deposited again, again and so forth. Data is imagined to give birth to more data which will solve problems and enable us to ask questions we are not even aware of we could have. It is an imaginary that is intimately connected to the passage of time.

Data multiple

”Rarely can a magic moment be established when things become data” writes Christine Borgman (2015) in her book on research data. This as a highly interesting and relevant statement, which helps us to focus our attention on the temporal character not only of research data, but also of myriads of other forms of data and the various materials and practices that are captured in the notion of data. Data is used to visualise the passage of time (Johansson, 2012). However, it is also a product of its time and it is that in at least three different ways.

Firstly, talk of data and particularly big data, but also open and even research data, in many ways epitomizes today’s pull towards quantification and the near constant measuring and auditing that characterises contemporary society to such a degree that it has become customary to talk of an auditing culture (Shore & Wright, 2015). Secondly, data - in much of contemporary discourse - are imagined to drive progress, to enable knowing, to speak for themselves and thus shape decisions without the frictions ascribed to other forms of knowing. They are projected into a future that they are going to shape. And this future is by virtue of the work of data, almost always imagined as a better place and a better time and this way mixed up with society’s master narrative of the necessity of economic growth. Of course this future never comes. As such today’s imaginaries of data and their intended purposes are shaped by and deeply implicated in the relentless acceleration of society that, as Hartmut Rosa develops in much detail, characterises modernity (Rosa, 2015). Thirdly, and with this underlying current of acceleration in mind, data can fruitfully be imagined as emerging along temporal axes. In this sense, data is made into different entities as time passes or as the passing of time is projected to happen.

This goes beyond different meanings that are assigned to the same entity. Clearly different meanings are ascribed to what we see to be the “same” data object and this has far reaching implications. Yet as data never is in and of itself, but only emerges when it is for instance planned for, accessed, stored, described, moved, related, or even deleted, the objects or entities that are referred when data is named are ontologically different. In this sense, we can speak of the data multiple, to draw on Annemarie Mol’s (2002) work. However, there exists a relationship between these data multiples and that which the data is evidence for or that which data is meant to achieve. This relationship needs to be stable — at least temporarily stable -, in order for us to be able to engage with and also to oppose certain uses of data.

Data never just is

Data never just is of and by itself. Data has to be captured, translated, reduced, surrounded by metadata, linked to other data, encapsulated in standards, visualised, mediated through digital tools and software and so forth (Haider & Kjellberg 2016, Leonelli 2015). Yet data never is of and by itself also in a different sense. It arises and is created in certain situations; that is, data results from and is evidence for things being done with things including with other data. In that sense, data is situated, relational, and emergent (see also Leonelli 2015). It is entangled across the very entities and practices it pertains to describe, predict, shape and also create. Christine Borgman’s (2015) remark quoted above calls the attention to the momentariness and fleetingness of data. This offers a possibility, I think, to frame more concisely in which ways data are situated and emergent. To conceive of the data multiple along temporal axes might help to achieve such a situated stability.

By böhringer friedrich — Own work, CC BY-SA 2.5, https://commons.wikimedia.org/w/index.php?curid=2395465

Data moments

Extending and breaking up the often idealised research process into moments of various practices can help to complicate the idealised and linear or circular image of a research process that starts and finishes with the researchers doing their work. This approach also allows to account for the various practices, infrastructures and policies that have to be in place prior to as well as after the actual research is concluded as a part of this process. This can be dealt with in different ways, however in order to be able to make sense of data and importantly to enable a responsible engagement with data - big or small - being situationally rigorous is important.

Breaking-up can also make relationships visible that exist between and across various moments outside the linear trajectory of progress in one research process which otherwise orders them and always points in one direction only. What is needed is an understanding of how various situated requirements produce data in a way that does not permanently fix it and thus make it unmoveable and impenetrable, but in a way that takes seriously the situational character also of a multifarious past, a messy present and a future where new magic moments might happen in very different ways, or where data will just become dusty and vanish.

What this gives way to, I suggest, is a more thorough way to see research data as deeply situated and emergent from and through practices. These practices are shaped by disciplinary conditions, but importantly also emerge from policy frameworks, and the constraints and affordances of infrastructures as they are enabled through for instance administrative units, libraries and archives. Importantly, what we need to focus on is not how data can or should be defined, but rather how data are done in practice over time. That is, research data is not something that simply is as such and which can be moved entirely unscathed across contexts and tools.

Rather, research data plays out in and through practices and these practices are entangled across - amongst other things - temporal axes. In this sense, data moments freeze different instants of the research process, whereas this process extends beyond what is conventionally called the research process. In order to achieve temporary technical and discursive stability certain relations are entered, between for instance software tools and users, policy documents and university administrations, standards developers and librarians, instruments and researchers, and so on, and in this way data is done. Some of these relations make it possible for data to document, provide evidence and to enter the archive. Others make it possible to plan and build infrastructures, yet others facilitate political visions and policy making.

Messing up stability

It is easy to agree that today’s dominant data discourse calls upon simplistic and even dangerous visions, where the possibilities of correlation are seen as heralding the end of theory and where ideas of data as expediting - as other technical solutions before it - a constantly delayed better future further contribute to society’s permanent acceleration.

Yet, the question we really need to ask is how can we move beyond pointing out that this is problematic to opening up ways to responsibly engage researchers, policy makers and the various private enterprises and public sector actors in a conversation that actually leads to implementing alternative ways to engage with data, including research data?

Calling attention to the various practices, institutional demands, tools and vested interests that mess up the imagined ontological stability of (research) data and show the various ways in which data are products of their times, might be a way to start.

About the author:
Jutta Haider is associate professor of information studies at Lund University. Her work concerns the conditions for knowledge and information in contemporary digital culture.

References:
Borgman, C. L. (2015). Big data, little data, no data: Scholarship in the digital age. Cambridge, MA; London, UK.: MIT Press.

Haider, J., & Kjellberg, S. (2016). Data in the making: Temporal aspects in the construction of research data. In J. V. Rekers & K. Sandell (Eds.), New Big Science in focus: Perspectives on ESS and MAX IV (pp. 143–163). Lund: Lund Studies in Arts and Cultural Sciences.

Leonelli, S. (2015). What Counts as Scientific Data? A Relational Framework. Philosophy of Science, 82(5), 810–821. http://doi.org/10.1525/jsah.2010.69.3.430.display

Mol, A. (2002). The body multiple: Ontology in medical practice. Durham, NC: Duke University Press.

Rosa, H. (2015) Social acceleration. A new theory of modernity. New York: Columbia University Press.

Shore, C., & Wright, S. (2015). Governing by numbers: Audit culture, rankings and the new world order. Social Anthropology, 23(1), 22–28.

Johansson, V. (2012). A time and a place for everything. Borås: Valfrid