How to Make a Significant Study

Mikey B.
Human Systems Data
Published in
5 min readMar 15, 2017

Often in modern science, the importance of research is determined by a very small piece of a very large puzzle. This small piece is called a P value, or a probability value. It is common practice in research to hinge the significance of results on the P value. The general rule of thumb is that if your study has p<.05, then you have achieved statistical significance. Understanding what that means is a bit complicated, but to put it simply (and probably understated) that means that your results only have a 5% chance (or less) of being the result of random chance. In other words, you are 95% certain that your study has found something! That sounds great, right?

Well hold on a minute. How do we figure out this P value?

Well YOU hold on a minute. Let’s first take a deeper look at what it is. Before you can really understand what a P value is, you need to understand hypothesis testing. Let’s make up an awesome study. You want to look into the aerodynamics of unladen swallows, and you believe that where they are from will differentiate their airspeed velocity. You gather up 50 swallows from Europe and 50 swallows from Africa and you are ready to put them to the test. In case you haven’t realized it yet, you have already formed a test hypothesis!

Your test hypothesis looks something like this:

H_1: Unladen European swallows have a different airspeed velocity from unladen African swallows.

In statistical testing, however, it is really a test of what is called the null hypothesis, which in this case would be something like this:

H_0: There is no difference in unladen airspeed velocity between African or European swallows.

Ok, so you have your hypotheses, now you put them to the test. You put your horde of swallows into your testing chamber and measure their airspeed velocity. Now you put all your stored up math wizardry to use and run some statistical analyses to find the means (averages) of the two groups and you compare them. Your results will likely fall onto a bell curve like this:

This is a fairly typical distribution, though yours could certainly look a little different. In fact, what you probably want (note: letting what you want affect how you perceive this experiment would bias your results, so don’t do that) would be something more like this:

You have two groups, right? Your testing hypothesis says that they will be different, like this second example, but if the null hypothesis is true, then it will look more like the first curve because they are really part of the same group. If you have done a good job at randomly selecting your swallows, your groups will be representative of the population as a whole, and the differences between the groups will thus be representative of the actual population as a whole. If your null hypothesis is true, both groups come from the same population. Well how do you know for sure that the null hypothesis is true or false? For the ordinary researcher, that’s where that P value comes into play.Using your math wizardry, you can discover an effect size (in this case, the size of the difference in airspeed velocity between the two groups) with an attached P value. Here’s a great example of how this wizardry is done if you are interested:

https://www.youtube.com/watch?v=-FtlH4svqx4

For now, let’s say you’ve calculated p=.05. If you remember that from earlier, you will get all excited and think, “That means that the results are significant!” Well, ok. Hold on. What that actually means is that the difference you have found has a 5% chance of being random. Remember that bell curve? Your group of swallows may have just happened to come from the edge of that bell. Well, 5% is pretty small, so let’s say you’re right. You can now reject the null hypothesis with 95% certainty because there appears to be a difference between the two groups. Now you can get your research published and share your work with swallow lovers the world ‘round!

Great! Time to poke holes in the fun balloon. First: what if you had found p=.15? Time to give up? Nah, just get more birds! Yup. That’s all you need to do to make that P value shrink. Say you grabbed 200 birds instead of your 100. Now the odds of you having p<.05 have doubled! Still excited about your P value? What if you just dumped some of your outliers? Ok, hang on. I’m just going to just stop now because there are plenty of ways to manipulate your P value and I think you’re starting to get the idea.

Now that you know the P value is a little bit fallible, you may be asking yourself: why do we rely on it so much, then? Well, that is the question, isn’t it? Greenland et al. (2016) wrote a great paper on this issue. I will leave it to you to dive deeper into the specifics, but let’s conclude our discussion here with a little pondering of their conclusion:

In closing, we note that no statistical method is immune to misinterpretation and misuse, but prudent users of statistics will avoid approaches especially prone to serious abuse. In this regard, we join others in singling out the degradation of P values into ‘‘significant’’ and ‘‘nonsignificant’’ as an especially pernicious statistical practice.

Rather than leaning on that little probability value, which is readily open to manipulation, for determining the importance of our research, let’s find more reliable and trustworthy ways to show how awesome our swallows are.

Reference:

Greenland, S., Senn, S., Rothman, K., Carlin, J., Poole, C., Goodman, S., & Altman, D. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. Eur J Epidemiol, 31, 337–350.

--

--