“P” — The Slayer of Hypothesis

Maya Toteva
Human Systems Data
Published in
3 min readMar 15, 2017

(and academic careers)

Statistical terminology is not something we use frequently in your daily conversations, at least most of us don’t. So, when reading this week’s articles, I had to refresh my understanding of P-value.

According to my Elementary Statistics (Larson & Farber, 2006), p-value measures whether a result can be attributed to chance, or it offers statistical significance to support the hypothesis. But what does that actually mean?

Most scientist are looking at the p-value, hoping it will confirm their test hypothesis as true (Greenland et al,2016), and that there is less than 5% chance the results from the testing are false. In reality, p-value cannot confirm the truth of a hypothesis, it can only summarize the results from the test. P-value is not meant to be the single most powerful measurement of significance, instead it was intended as informal part of a scientific process, combining data and theory to reach scientific conclusions (Nuzzo, 2014).

So, it turns out that P- value reveals nothing about the strength of the evidence, yet it is the main technique to assess evidence. Here is where scientist become creative. With so much time and effort invested in the research, they are not giving-up easily, and resort to unethical strategies. They are determined to find significance, so they begin to manipulate the data. They begin exploring it in a different way — adding and dropping of conditions; stop collecting data once P reaches .05; collect and analyze many conditions but only report those with P< .05; use covariates to get to the desired P-value (Simmons, et al, 2011). This data manipulation is also known as “p-hacking”, and leads to “False-positive psychology”, or reporting only the statistically significant data, which does not provide proper explanation about the observed phenomena.

In recent years, a disturbing pattern of data manipulation has surfaced (Gelman, 2016), and many previously regarded as outstanding scientist and scholars, have fallen victim of the “P”. Many promising carriers were prematurely ended because of misunderstanding and misuse of the “researcher degree of freedom”. After all, it is a personal choice to report fraudulent results.

But as Gelman & Loken say in their article “The Garden of forking Paths” (2013)” Criticism is easy, doing research is hard.” They propose a new approach to data analysis called “preregistered replication”. In this approach, the claim of statistical significance is eliminated, and Bayesian method for data analysis is applied This allows the acceptance of the observed comparison as evidence and underlying effect. Applying this method, scientist will have the “researcher degree of freedom” to decide how to confirm the observed effect.

Knowing that p-value causes more problems than it is worth, the question is are we ready to move on to a different method for finding significance in research? Are we open to discuss the limitations of conventional statistics, and seek answer somewhere else, or we are willing to remain hostages of the p-value?

Gelman, A. (2016). Statistical Modeling, Casual Inference, and Social Science. What has happened down here is the winds have changed. Retrieved from http://andrewgelman.com/2016/09/21/what-has-happened-down-here-is-the-winds-have-changed/

Gelman, A., & Loken, E. (2013). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. Department of Statistics, Columbia University.

Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P. European journal of epidemiology, 31(4), 337–350.

Larson, R., & Farber, B. (2006). Elementary statistics. Pearson Custom Pub.

Nuzzo, R. (2014). Statistical errors. Nature, 506(7487), 150.

Simmons, J. P., Nelson, L. D. & Simonsohn, U,, False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant (May 23, 2011). Psychological Science, 2011.

--

--