Not-so Critical Analysis

What’s the significance of ‘significance’?

--

(Hover over a paragraph to comment.)

If you’ve ever been told to ‘critically appraise’ or ‘critically analyse’ a source, you’d be forgiven for wondering what exactly that might involve. Well, it turns out the professionals aren’t necessarily all that great at doing it either (or, at least, they’re creative when it comes to avoiding their own shortcomings).

If it’s a scientific paper, one of the first places to look for some critical analysis fodder is the data section. Do the authors present numerical measurements or recordings? Are they trying to say something about a hypothesis? If so, there’s a good chance they’ll have done some statistical analysis.

Broadly speaking, the point of a statistical test is to determine whether there’s a REAL difference between two or more things that have been measured. Perhaps a coin was flipped 100 times and the authors found that 48 times out of 100 it landed heads-up, and 52 times it was tails. It’s a difference, definitely, but is it really outwith the range of results you’d expect?

It’s very uncommon to find a system without random fluctuations, and for this reason we need to distinguish between statistically significant and non-significant results.

Traditionally, researchers have based significance on a rule of thumb based around the number 20. If you feed all of the information about your experiment into a stats test and the test says that your measured result would have appeared just by random chance 10 times out of 20 (i.e. half of the times), we say your result is not statistically significant. Bad news for a scientist trying to prove that their treatment works.

If, however, the test says that you’d need to repeat the experiment 1,000,000 times in order to get your result by random chance, you can be pretty certain that you’ve not just been (un)lucky as a researcher and obtained that 1-in-a-million result, but rather that things weren’t happening at random. Technically speaking you would “reject the null hypothesis”. A clearer way of putting it is to say that your result that was actually caused by your actions (e.g. a drug treatment).

Stick with me; the lesson is almost over.

1 in a million is quite a high bar for statistical ‘proof’ —insetad, we’ve generally decided that if your stats test shows that you’d need to repeat an experiment 20 times in order to have found your result at random, it usually gets the scientists’ seal of approval.

The way a researcher would talk about this in a report would be to say that the probability (‘p’, or the ‘p value’) of such a result would have to be less than 1 in 20. 1 divided by 20 equals 0.05, and the symbol ‘<’ means ‘less than’, so the short hand for this cut-off is ‘p < 0.05'.

So what does this all mean for our critical analysis of a results section?

It means that statistical tests that churn out p values of 0.06, 0.07, 0.08 and so on (i.e. numbers that are larger than 0.05) are bad news for the researchers, and that p values of 0.049, 0.045, 0.04, 0.03 and so on (which are smaller) are what every scientist hopes to find.

But what do you do if you’ve spent months and months doing your meticulous research, spending lots of research money and sleeping under the desk in your office… just to find you’ve got a result with a p value just very slightly on the wrong side of the line?

Well, it turns out there are 509 (and counting) different phrases in the scientific literature that scientsists can use to cheat: have a peek at the Further Reading link before you critically analyse your next source.

With this many variations on a cheat’s phrase to be aware of, there’s a significant probability you might just spot one in the wild yourself…

(Classroom by eflon is licensed under CC by 2.0)

--

--