The good, the bad, and the appropriately under-powered

Many quantitative studies are good — they employ appropriate methodology, have properly specified, empirically valid hypotheses registered before data collection, then collect sufficient data transparently and appropriately. Others fail at one or more of these hurdles. But a third category also exists; the appropriately under-powered. Despite doing everything else right, many properly posed questions cannot be answered with the potentially available data.

Two examples will illustrate this point. It is difficult to ensure the safety and efficacy of treatments for sufficiently rare diseases in the typical manner, because the total number of cases can be insufficient for a properly powered clinical trial. Similarly, it is difficult to answer a variety of well-posed, empirical questions in political science, because the number of countries to be used as samples is limited.

What are the options for dealing with this phenomenon? (Excepting the old unacceptable standby of p-hacking, multiple comparisons, etc. and hope the journal publishes the study anyways.) I think there are three main ones, none of which are particularly satisfactory

  1. Don’t try to answer these questions empirically, use other approaches.
    If data cannot resolve the problem to the customary “standard” of p<0.05, then use qualitative approaches or theory driven methods instead.
  2. Estimate the effect and show that it is statistically non-significant.
    This will presumably be interpreted as the effect having a small or insignificant practical effect, despite the fact that that isn’t how p-values work.
  3. Do a Bayesian analysis with comparisons of different prior beliefs to show how the posterior changes.
    This will not alter the fact that there is too little data to convincingly show an answer, and is difficult to explain. Properly uncertain prior beliefs will show that the answer is still uncertain after accounting for the new data, but will perhaps shift the estimated posterior slightly to the right, and narrow the distribution.

At the end of the day, we are left with the unsatisfying conclusion that some questions are not well suited to this approach, and when being honest we should not claim that the scientific or empirical evidence should shift people’s opinions much. That’s OK.

Unless, perhaps, someone out there has clearer answers for me?