Replication Crisis is not only the problem of p-value.

Maxine
Human Systems Data
Published in
5 min readMar 15, 2017

Replication crisis is a scientific methodological phenomenon in which the result of some experiments can hardly be replicated in the subsequent studies (Schooler, 2014). It might be caused by different factors. In this investigation, I located two factors, p-value problem and the cognitive bias.

To calculate p-value, we need to know hypothesis first. H0 is null hypothesis and H1, which is the opposite argument is called alternative hypothesis. Samples distribution from experiment may be different from that of the population, which might bring random error. To accept H0 means to reject H1. If the effect exists and cause difference, H0 is rejected. Otherwise it is accepted. However, there are very small chance that H0 is true and was rejected, which is called Type I error. P-value is the possibility of the null effect. For example, when p=0.05, this means that there are 5 among 100 cases might do not have effect. If p-value is very small, this means that the effect is strong enough to overcome the effect of random error and thus, H0 is rejected.
The type I error refers to the rejection of a true null hypothesis and type II error is retaining the false null hypothesis. That is to say, Type I Error is detecting a not-present effect while Type II error is falling to detect an effect that is present. Since people intended to dig some meaningful results from the experiment that they devised, type II error seems to be less common than Type I Error.
Recently, p-value are less paid attention to due to replication crisis in psychology. P value can means nothing and there are a lot of techniques to manipulate it. P-hacking, or “researcher degree of freedom”, refers to the statistical significance can be reached even when there is nothing happened, as long as the researcher dig hard enough (Gelman & Loken, 2013). According to Nelson (2014), there are six ways to reach the p<0.05 threshold, from collecting data to analysis method (Nelson, 2014). With this manipulation of data, any experiment will definitely have significant findings (if researcher intend to do so).
Another important indicator, power, refers to the probability that a test will find a significant finding when it really exist (Meera), began to be introduce to academic research. Related to the effect size, this value improves the quality of data analysis. A strong power perhaps adds persuasion to the thesis. The power is related to the effect size, which is influenced by number of participants. When the replication experiment did not reach the level of original, it will be hard to say whether the replication will show the tested effect or not. And the many experiments in OSC replication had less participants (Gilbert, King, Pettigrew& Wilson, 2016).

When it comes to replication crisis, I also think about cognitive bias.
Cognitive bias is a systematic derivation from norm or rationality in decision making, with which people may make judgments illogically (Haselton, Nettle, & Murray, 2005). One common cognitive biases in research is confirmation bias, which refers to the tendency that people are likely to search for, retrieving information, interpret in the way that confirms their preexisting beliefs or hypotheses (Plous, 1993). This kind of cognitive bias has been found in many experiment in which people tend to test their hypotheses in a one-sided way by searching what is consistent with their hypothesis selectively (Nickerson,1993; Kunda, Ziva 1999; Emerson, Warme, Wolf, Heckman, Brand,& Leopold, 2010). Confirmation bias is closely involved in everyone’s life, including searching information, interpretation and memory (Plous, 1993). In psychological study area, cognitive bias are often found related to questions in questionnaires. For example, people were asked with “Are you happy with your social life” tend to be happier than those who were asked with “Are you unhappy with your social life”.(Kunda, Fong, Sanitioso,& Reber.1993) Another reason perhaps due to the accuracy of testing instrument, measurement in social psychology usually less accurate than that of cognitive psychology.
In the practice of behavior research, since this cognitive bias is involved in hypothesis-test model, any author may face such potential danger in his or her thesis. Their data reported can be potentially affected and lead to the file drawer effect when actual data conflicts with their expectation. In the OSC replication case, social psychology findings seems harder to replicate, compared to that of cognitive psychology (Williams, 2015), which may be violated by confirmation bias, considering the subjective feature of social psychology. This common bias works on the original experiment and the replication experiment so it can be predicted that in the process of replication test, confirmation bias may also interfere researcher’s study if he or she intends to do so.
To conquer the confirmation bias, one way is to use the randomized controlled trials (Shadish,2007). The other arguable way is the process of peer review, which may be vulnerable to confirmation bias itself based on a research on peer review in 2010(Emerson, et al, 2010).

In summary, to conquer p-value problem, larger sample size is required. Also, people need to reconsider experiment design methodology and the value of replication.

Reference

Emerson, G. B., Warme, W. J., Wolf, F. M., Heckman, J. D., Brand, R. A., & Leopold, S. S. (2010). Testing for the presence of positive-outcome bias in peer review: a randomized controlled trial. Archives of Internal Medicine, 170(21), 1934–1939.

Gelman, A., & Loken, E. (2013). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. Department of Statistics, Columbia University.

Gilbert, D. T., King, G., Pettigrew, S., & Wilson, T. D. (2016). More on “Estimating the Reproducibility of Psychological Science”. Available at projects. iq. harvard. edu/files/psychology-replications/files/gkpw_post_publication_response. pdf.
Ioannidis, J. P. (2005). Why most published research findings are false. PLoS Med, 2(8), e124.

Haselton, M. G., Nettle, D., & Murray, D. R. (2005). The evolution of cognitive bias. The handbook of evolutionary psychology.

Kunda, Z., Fong, G. T., Sanitioso, R., & Reber, E. (1993). Directional questions direct self-conceptions. Journal of Experimental Social Psychology, 29(1), 63–86.

Kunda, Ziva (1999), Social Cognition: Making Sense of People, MIT Press, ISBN 978–0–262–61143–5, OCLC 40618974

Nelson, Leif D. (2014) False-Positives, p-hacking, Statistical Power, and Evidential Value (PDF document). Retrieved from:http://www.bitss.org/wp-content/uploads/2015/12/False-Positives-p-Hacking-Statistical-Power-and-Evidential-Value-Leif-Nelson.pdf

Nickerson, R. S. (1998). Confirmation bias: A ubiquitous phenomenon in many guises. Review of general psychology, 2(2), 175.

Plous, Scott (1993). The Psychology of Judgment and Decision Making. p. 233.

Schooler, J. W. (2014). Metascience could rescue thereplication crisis’. Nature, 515, 9.

Williams, Steve Stewart.(2015, Sep 06). A Quick Guide to the Replication Crisis In Psychology, Retrieved from: https://www.psychologytoday.com/blog/the-nature-nurture-nietzsche-blog/201509/quick-guide-the-replication-crisis-in-psychology

--

--