Classical tests: Biased against the null?

A common argument — one I have cited before, and has been used in papers I am an author on — holds that significance tests are biased against the null. I’ve never really liked this argument; I tend to think that the concepts of evidence should be evaluated on their own terms. Classical tests being evaluated by Bayesian standards makes sense among Bayesians, but the argument is used more broadly, as an argument for Bayesian statistics. In this blog post, I discuss why I think this isn’t a good argument.

Image used by permission under the CC BY-NC-ND 2.0 license.

Let’s first outline the two major forms of the argument. First is the Berger and Sellke argument that a p value around 0.05 can, to a Bayesian with certain priors, can correspond to a high posterior probability for the null. The second is that significance tests never allow you to “accept” the null hypothesis.

The first form of the argument is easy to dismiss to a non-Bayesian, and uncontroversial to a Bayesian, and hence, does not serve as an argument for Bayesianism. A non-Bayesian — or even a Bayesian — can simply say, “Yes, they’re different ideas.” This is precisely what a Bayesian would say if someone came to them with a significant result that had a null or equivocal Bayes factor.

The second form of the argument is vested in a particular viewpoint — one that is not universally held — about how p values are to be used. In the unfortunately common but simplistic practice of solely computing a p value to reject a null hypothesis, one might argue that the special role accorded to the null represents a form of bias against the null. But this is not the only way to use p values, and it is not advocated by major proponents of classical tests (or their inversions, confidence intervals). It is hard to see how computing curves of ones-sided p values in order to bound a parameter estimate, or using a confidence interval, is biased against the null. All parameters are treated equally. One might object that people use the confidence interval as a significance test, by simply checking whether the null hypothesis is inside the confidence interval; however, this represents a pro-null bias on the part of the user of the confidence interval, not an anti-null bias on the part of the procedure.

This is not to say that a bias for the null is not a good thing. There are both frequentist and Bayesian methods that build in a bias for the null; for instance, the LASSO and stepwise procedures on the frequentist side, or Bayes factors with point null hypotheses on the Bayesian side. We have argued that a pro-null bias is helpful in assessing regularities. But there’s nothing necessarily Bayesian about shrinkage toward the null, though. Many Bayesians use priors that do not shrink at all (so-called “non-informative” priors), or priors that shrink toward a non-null value when the null is not believable.

I think it best to say that p values are not inherently biased against the null, though they can be used in procedures that are (e.g. typical NHST). One can also say that some other procedures — both Bayesian and frequentist — are biased for the null, and that can be a good thing. But the argument that p values are biased against the null is unconvincing. Only a Bayesian who is used to bias in the other direction would accept that argument; hence it doesn’t stand very well as an argument for Bayesianism itself.