On replication in HCI

At CHI 2013 I attended a 2 day workshop on ‘replication in HCI’ (also known as ‘RepliCHI’, a programmatic concern reflected in various workshops and panel discussions at prior CHI conferences). To summarise the purpose of the workshop I will turn to the workshop proposal abstract, which does a fine job:

The replication of, or perhaps the replicability of, research is often considered to be a cornerstone of scientific progress. Yet unlike many other disciplines, like medicine, physics, or mathematics, we have almost no drive and barely any reason to consider replicating the work of other HCI researchers. Our community is driven to publish novel results in novel spaces using novel designs, and to keep up with evolving technology. The aim of this workshop is to trial a new venue that embodies the plans made in previous SIGs and panels, such that we can begin to give people an outlet to publish experiences of attempting to replicate HCI research, and challenge or confirm its findings.

Broadly speaking the position I presented (to be found elaborated in the paper Christian Greiffenhagen and I submitted) was intended to problematise the assumptions built into this notion of ‘replication’ in HCI. One of the key problems we saw was not necessarily that there is anything at issue in asking HCI to tolerate / involve more replication. Rather, that framing this in terms of making HCI more ‘scientific’ is possibly based on a mythical view of ‘good science’ of which “replicability of published research findings […] is often considered a cornerstone of progress” (see above). Our general position was that sociology of science warns us against holding a mythological view of replication in the natural sciences and then applying this to HCI. Replication of results in, say, physics, serves a highly motivated and particular purpose in working through contested parts of the discipline, rather than being a practice engaged in as a matter of course in order to be seen ‘doing normal science’. Further, the existence of any ‘decisive replication’ becomes problematic in that there is not necessarily any standard for what is considered a valid replication before said replication is attempted.

The workshop’s submissions discussed a range of instances attendees at the workshop considered examples of ‘replication’. But there were a number of nagging issues that continually arose (note that those documented below are not necessarily things voiced by me during the workshop).

1. Classification

There were considerable problems arising from the idea of classifying different ‘types’ of replication (the original workshop proposal identified four: “direct replication”, “replicate and extend”, “conceptual replication” and “applied case studies”). The workshop organisers themselves admitted to a significant amount of argument in even coming up with them.

This replication type became more confusing when considering if the intent of the replication was to replicate original research findings, the original research method, or both. Thus it turned out that the intent of the replicating authors was important for articulating what any given replication is ‘doing’, i.e., in terms of whether it is testing and validating ‘the instrument’ used to generate particular findings (e.g., a methodical procedure), or testing the findings themselves.

Another matter about classification was ‘self-replication’. A few presenters described what were essentially replications of their own prior work, typically with some extension. I found these quite problematic: surely for replication to have any meaning, it must be understood as a fundamentally social, negotiated process? Self-replicating an experiment or trial is not really getting to the heart of what ‘attempting to replicate findings’ is doing for negotiating contested results between members of a research community.

Thus, the important underlying issue for classification schema here is that it was unclear whether a particular set of HCI researcher activities constituted, could be classed as, or could be recognised by others as, a replication at all.

2. Fidelity

In one view, it seemed that the more details unearthed by a replicating author the better: increased fidelity of the replication could be achieved through re-creating the environment in which the original study was conducted. Involving existing authors seemed key, as well as access to their data.

Yet, in another view, it was not clear ‘how much’ of the original circumstances were necessary to be ‘really doing a replication’. In the end a ‘perfect’ replication is obviously intractable, and as such the adequacy of material actions leaving to being seen as ‘doing a replication’, aside from the practical circumstances of its production, are determined through social agreement / disagreement amongst a community of researchers.

3. Incremental versus novel

A common problem seemed to be the argument that paper reviewers were found to be frequently shooting down papers perceived to be replicating prior work with statements similar to “it’s just a replication” or “too incremental” or “not novel enough”. However, within the workshop while attendees did report similar experiences, there were also a number of successfully accepted papers (e.g., actually were appearing in the main conference track).

There are two things to say about this. Firstly, I would argue that the sorts of criticisms described are not just used against the more experimentally-oriented HCI work, but instead is a pervasive phenomenon across (CHI) submission types. So any cultural change to move away from the matters raised by authors attempting to get what are seen as ‘replications’ of prior studies published must also acknowledge that this is part of a wider problem. Secondly, and perhaps more cruelly, it is possible that the reason why authors are having trouble turning papers which exhibit some measure of replication into publications, is simply that, based on the fact that some indeed are being published, the papers are just not very good. However, rather than being a matter of the content not being good enough, I suspect that most of the time it is actually the articulation of the contribution which is causing the problems for acceptance.

Leaving this second point alone for the moment, the first point underlines the importance of good reviewing, but also good articulation of the contribution by authors. By ‘good reviewing’ in this context I mean reviewers that seriously consider the contribution of the work, irrespective of whether it is seen as ‘just a replication’. Considering the contribution is about appreciating that there may be value in ‘going over old ground’. On balance, however, I felt that while it was clear reviewers needed a cultural change, authors need to do more to articulate the contribution of their work too, instead of hiding behind a picture of ‘normal science’ in which ‘doing replication’ is seen as an adequate contribution because ‘that’s what science practice is’.

4. Normal HCI practice

The RepliCHI programme has the potential to actually work against its purpose. This is because in offering a venue for work considered to be ‘replication’, it may well silo such work and through this de-normalise it as part of HCI research practice. Ironic indeed.

5. Ambitions

A final note, which I have to attribute to a conversation with Bob Anderson around this matter of replication. The sorts of ambitions that the RepliCHI programme has (which include political ambitions) can potentially also have epistemological consequences. For instance a political manoeuvre by psychology (say) to adopt the methods and standards of the natural sciences as part of a ‘package’ (i.e., to ‘look like physics’) in order to establish itself politically as a science has had epistemological consequences in terms of debates about replication, the kinds of experimentation which is valid, how results are reported, and so on.

In HCI we should be aware of these consequences and argue that political ambitions as a field can be changed rather than being subject to them.


Originally published at notesonresearch.tumblr.com.