Software engineering research, you are failing me

I have been ranting for sometimes now about how scientific knowledge is actually produced in SE. To be completely honest my rants are due to a bad, unexpected rejection.

Yes, I am looking at you peer-review, the brittle basis we researchers trust to carry-out our endeavours. I guess my issues with peer-review reached an all time high after stumbling upon two articles within few days.

The first is an experiment made within the NIPS (the most important machine learning conference) reviewers committee. They (artificially) created two independent committees and tasked them with the same acceptance rate. Turns out that the treatment committee rejected 57% of the papers accepted by the control committee. In other words, the review process is close to random. Now, the thing that makes me worry is that NIPS’s review process is double-blind — something that AFAIK we can just dream about in SE research — which should remove a good amount of bias.

Honestly, I would like to see this kind of candid approaches in SE conferences/journals as well. Along with acceptance rate, such kind of metric (i.e., What’s the chance that my paper would be rejected by a different committee within the same outlet?) would be of tremendous value, both for a researcher but — most importantly — for the conference/journal’s RC. It would be a way to say something like: ”Hey look, we acknowledge the peer-review is not perfect, it is what it is, in our case it has this margin of error”. (Yes, I know. This would require an overhead effort in reviewing papers and finding reviewers, etc etc).

Which brings me to the second part of the rant, this time addressed specifically towards the SE research community. I guess it is fair to assume that behind so much disagreement in the peer-review (which I again assume to be as good/bad as for NIPS) there’s the lack of standardisation of the process (or, in alternative, the reviewers’ lack of knowledge once such standards exist).

Well, it turns out that this is the case for a pivotal aspect of empirical investigations: the threats to validity a study might suffer from.

Some wonderful1people at University of Passau, first scavenged the papers published in major software engineering outlets and found that 91% of the published papers contain empirical study (go ESE!), but out of these only 54% contain a discussion of the threats to validity, and out of the latter only 23% differentiate between the types of threats (boo ESE!). Now, these results are stunning already. I mean, how could an empirical study be published in a respected outlet without a discussion of the threat to its validity in the first place? I will go out on a limb here, and say that maybe the reviewers don’t really know/acknowledge the importance of discussing threats in empirical disciplines, maybe.

The same researchers then asked 807 researchers — who have served in the RC for such outlets — their opinion about the tradeoff between external vs. internal validity and got 94 answers 2
 From their sample, they found that

reviewers are not aware of the tradeoff between internal and external validity

ICYM: there is always a tradeoff, it is just a matter of acknowledging on which side one decides to lean on. 
 In particular, this result is stunningly painful (intellectually speaking). If you ever designed and conducted an experiment you should have noticed the tradeoff straight away. Seriously.

Interestingly, a considerable number of participants stated that only externally valid studies have value, yet each study is supposed to have immediate practical impact.

Which confirms the previous point.

But most interestingly, the result shows that reviewers (aka, the community) is really polarised towards either or kind of validity. Which, in turn, confirms that there aren’t accepted guidelines about which one should be preferred 3, which in turn means that your paper is at the mercy of the reviewers’s preferred kind of validity threat.

The papers reports also the reviewers’ opinions about replication, but don’t even get me started on that…


Originally published at dfucci.co on February 5, 2015.