To P value or not to P value, that is the question

Sainjeev Srikantha
4 min readMar 14, 2017

--

To first start off, a p value is the probability of obtaining results equal to or an extreme of what was actually observed. A p value is used in statistical hypothesis testing and more so used in testing the significance of the null hypothesis. The null hypothesis is the opposite of the alternative hypothesis (Rumsey, 2016). For example, is an alternative hypothesis states that there is a significant effect of a drug on reaction time on rats, the null hypothesis would be that there is no significance of the drug on reaction time of rats (Khan Academy, 2017). In order to test whether the null hypothesis is significant a p value threshold is established, and based on the p value the null hypothesis is either rejected or accepted . A small p-value of less than or equal to 5% means that there is strong evidence against the null hypothesis, a p value greater than or equal to 5% means that there is weak evidence against the null hypothesis, and p values close to the 5% cut off could go one of either way and would need more research (Rumsey, 2016).

Greenland et al (2016) brings up the key problem of p values, which is that there are misrepresentations and abuse of statistical tests, confidence intervals, and statistical power. Abuse of statistical tests have been running rampage for years, and there have been many misinterpretations of p values. According to Greenland et al (2016) the general definition of a P value can help understand why a statistical test tells us much less than what many think they do. There are many examples of these misinterpretations. Two of the many misinterpretations include, 1)the p value is the probability that the test hypothesis is true and )a significant test result of p<0.05 means that the test hypothesis is false or should be rejected (Greenland et. al., 2016). These two misinterpretations jumped out at me, because they were things that I thought were true prior to reading this article. I learned that the p value assumes the test hypothesis is true and is not the probability of the hypothesis. Another concept I learned was that a small p value marks the data as unusual. A reason for a small p value could be that there was a large random error (Greenland et. al, 2016).

Greenland et al (2016) discussion on the abuse of p values within the scientific community made me think of a topic that I had learned in a previous statistics class taught by Dr. Robert Gray at Arizona State University. This topic is p hacking. P hacking is the manipulation of the process of statistical analyses to get a significant value . There are three types of p hacking. The first is when the researcher assesses more than one dependent variables, but only reporting those in which significant effects are obtained. The second is when assesses more than two conditions and leaves out conditions that are not significantly different. The third is when a researcher collects the planned amount of data, analyzes the results, then adds participants and reanalyzes the results until when significance is found. (Robert Gray, Lecture). P hacking has been prevalent in the scientific and psychology world of statistics. A study done by John, Loewenstein, and Prelec had participants anonymously indicate whether they had personally engaged within ten different questionable research practices, and if they had, whether they thought their actions had been defensible. The order in which the questionable research practices were presented was randomized between subjects. The study included 2,155 respondents to the survey, which yielded a response rate of 36%. Of respondents who began the survey, 33.4% did not complete it. However, data from all respondents, even those who did not finish the survey, were included in the analysis (John, Loewenstein, & Prelec, 2012). A meta-analysis of surveys found that, among scientists from a variety of disciplines, 9.5% of respondents admitted to having engaged in questionable research practices other than data falsification; the upper-boundary estimate was 33.7%. In the study presented, the mean self-admission rate was 36.6 which is higher than both meta-analysis estimates. Among participants who completed the survey, 94.0% admitted to having engaged in at least one questionable research practices (compared with 91.4% in the control condition) (John, Loewenstein, & Prelec, 2012). The study done by John, Loewenstein, and Prelec show that p hacking is very common in the scientific community, and is an issue at large.

In conclusion there are still many issues with p values and many scientific journals are now questioning the significance of studies due to these issues. It will be interesting to see how the scientific community will handle the future of p values.

Works Cited

Gray, R. (2016). Powerpoint lecture 9 Arizona State University : Psy 530

Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P. European journal of epidemiology, 31(4), 337–350.

John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling. Association for Psychological Science, 524–531. Retrieved 2016, from https://www.cmu.edu/dietrich/sds/docs/loewenstein/MeasPrevalQuestTruthTelling.pdf.

One-tailed and two-tailed tests. (n.d.). Retrieved March 14, 2017, from https://www.khanacademy.org/math/statistics-probability/significance-tests-one-sample/tests-about-population-mean/v/one-tailed-and-two-tailed-tests

Rumsey, D. J. (2016). Statistics for dummies. Hoboken: Wiley Publishing Inc.

--

--