Should you even do a hypothesis test?

Should you do hypothesis tests? That may seem like a silly question, especially if you are new to statistics and data science. The answer seems like it has to be yes. So much of what you learn is about hypothesis tests. But, often, the answer is “no”. Or, to elaborate slightly, “no, you should look at effect sizes and the precision of your estimate”.

Let’s look at what a hypothesis test is, exactly. You formulate a null hypothesis. This is usually (but not always) a statement that nothing is going on.

  • There’s no difference between groups
  • There’s no relationship among variables
  • The mean is 0

or something like that.

Then you do some analysis and try to reject the null. You decide to reject the null if you get a p-value below some critical (but arbitrary) value (often 5% or 1% — why not 4% or 10% or 0.1% or ….). The p value answers a very specific question:

If, in the population from which this sample was randomly drawn, the null hypothesis was true, what are the chances of getting a test statistic at least as extreme as the one we got, in a sample the size of the one we have.

I posit that this is rarely interesting. First, very often, your sample is not randomly drawn from a population. Second, the null is never exactly true in the population. Third, the question of whether it is exactly true is almost never interesting.

What are you actually interested in? If it’s not whether the null is true, then what? In almost all cases, you are interested in effect size.

  • Not “is one advertising strategy better than another?” but “How much better is one strategy and how sure are we about how much better?”
  • Not “Is there any relationship among these variables?” but “what is the relationship among these variables and how reliable is our estimate?”

and so on.

So, what is an effect size? There are many. Some of the more common ones are:

  • Difference between two or more means — measured by the difference (but maybe you should look at the ratio)
  • The relationship between a set of predictor variables (aka independent variables) and a predicted (dependent variable) — Often measured by R² or F for a linear regression, but there are also effect sizes for other forms of regression such as odds ratios for logistic regression.
  • Differences in medians — measured by the difference (but maybe you should look at other quantiles)
  • Wikipedia (naturally) has a list of effect sizes:

How big does the effect size have to be for it to be interesting? That’s entirely dependent on what you are studying. If you are studying airplane safety, then a change that increases the number of crashes by 1 in 1,000 flights is absolutely huge. But if you are studying the common cold, such an increase is irrelevant. You’re going to have to think.