Everything You Know is Wrong

Earl Radina
Human Systems Data
Published in
3 min readMar 14, 2017

Sensationalist headlines aside, this week’s readings were concerned with the flaws of the current state of science with a particular emphasis on p-values and other measures of significance. “Significance”, being the measure in which we decide whether or not a particular data set confirms a hypothesis due to the data fitting the proposed theory, or to chance.

In fact, “significance” is what we will be focusing on in this post. Greenland et al. (2016), touched on the topic throughout the paper but finished the discussion strong with the idea that making significance dichotomous is both wrong and dangerous to science. This is partially due to how we, as scientists, get published as well as how we measure significance to in general.

The phase, “Publish or Perish” was originally coined in 1932 by HJ Coolidge (Rawat & Meena, 2014). It was said to describe the pressure that academics were put under to either publish work, or risk being fired. Besides the stress of losing your job, there is an added caveat. Studies which find positive, significant results are far more likely to be published than those that don’t (Matosin et al., 2014). Because of this, researchers have had to find ways to quickly and easily prove, “significance”. Enter the p-value.

The p-value is in essence a, “percentage value”. It is the value to which a researcher could reasonably estimate that their findings would be representative of the null hypothesis (a counter hypothesis which usually states that a treatment has ANY effect OTHER than what is predicted by the researcher). However, over-reliance on this statistical technique has caused a bit of a stir in the scientific community. Greenland et al. (2016) noted that the p-value is based on a mountain of assumptions. This poses a huge issue. If any one of those assumptions isn’t accounted for in the research design, the whole system can collapse like a tower of blocks.

So what are the proposed solutions? Greenland et al. (2016) first proposes confidence intervals and then proceeds to point out flaws in those as well. However they do seem to make an interesting point that perhaps while one confidence interval is less useful, several compiled from multiple repeat studies will be much more likely to show the validity and size of an interaction. Andrew (2016), also mentions the growing concern over replicated results.

Does anyone else have any opinions or solutions? Should the public just get more educated on what p-values ACTUALLY mean? Or should we abolish them altogether in favor of an aggregate confidence interval method? Should we not even trust a study unless it’s been replicated multiple times?

Works Cited

Arthur. (September 21, 2016). What has happened down here is the winds have changed. Statistical Modeling, Causal Inference, and Social Science. Retrieved from http://andrewgelman.com/2016/09/21/what-has-happened-down-here-is-the-winds-have-changed/

Greenland, S., Senn, S., Rothman, K., Carlin, J., Poole, C., Goodman, S., Altman, D.(2016). Statistica tests, p values, confidence intervals and power: a guide to misinterpretations. Eur J Epidemiol, 31, 337–350. DOI 10.1007/s10654–016–0149–3

Matosin, N., Frank, E., Engel, M., Lum, J. S., & Newell, K. A. (2014). Negativity towards negative results: a discussion of the disconnect between scientific worth and scientific culture. Disease Models & Mechanisms, 7(2), 171–173. http://doi.org/10.1242/dmm.015123

Rawat, S., & Meena, S. (2014). Publish or perish: Where are we heading? Journal of Research in Medical Sciences : The Official Journal of Isfahan University of Medical Sciences, 19(2), 87–89.

--

--