Want to know about Statistics?

B HARSHA VARDHAN
featurepreneur
Published in
4 min readJul 8, 2021

“ I can prove anything by Statistics except the Truth. ”

It’s any old way of mushing up our data. Yup. 100% technically correct definition. Now let’s see what the discipline of statistics is all about.

Statistics is the science of changing your mind.

Making decisions based on facts (parameters) is hard enough as it is, but -curses!- sometimes we don’t even have the facts we need. Instead, what we know (our sample) is different from what we wish we knew (our population). That’s what it means to have uncertainty.

Statistics is the science of changing your mind under uncertainty. What might your mind be set to? A default action or a prior belief. What if your mind’s a blank slate?

Bayesian statistics is the school of thought that deals with incorporating data to update your beliefs. Bayesians like to report results using credible intervals (two numbers which are interpreted as, “I believe the answer lives between here and here”).

Frequentist statistics deals with changing your mind about actions. You don’t need to have a belief to have a default action, it’s simply what you’re committed to doing if you don’t analyze any data.

Hypotheses are descriptions of what the world might look like.

The null hypothesis describes all worlds were doing the default action is a happy choice; the alternative hypothesis is all other worlds. If I convince you -with data!- that you don’t live in the null hypothesis world, then you had better change your mind and take the alternative action.

All of the hypothesis testings is all about asking: does our evidence make the null hypothesis look ridiculous? Rejecting the null hypothesis means we learned something and we should change our minds. Not rejecting the null means we learned nothing interesting, just like going for a hike in the woods and seeing no humans doesn’t prove that there are no humans on the planet. It just means we didn’t learn anything interesting about humans existing. Does it make you sad to learn nothing? It shouldn’t, because you have a lovely insurance policy: you know exactly what action to take. If you learned nothing, you have no reason to change your mind, so keep doing the default action.

The p-value’s on the periodic table: it’s the element of surprise.

The p-value says, “If I’m living in a world where I should be taking that default action, how unsurprising is my evidence?” The lower the p-value, the more the data are yelling, “Whoa, that’s surprising, maybe you should change your mind!”

To perform the test, compare that p-value with a threshold called the significance level. This is a knob you use to control how much risk you want to tolerate. It’s your maximum probability of stupidly leaving your cozy comfy default action. If you set the significance level to 0, that means you refuse to make the mistake of leaving your default incorrectly. Pens down! Don’t analyze any data, just take your default action. (But that means you might end up stupidly NOT leaving a bad default action.)

A confidence interval is simply a way to report your hypothesis test results. To use it, check whether it overlaps with your null hypothesis. If it does overlap, learn nothing. If it doesn’t, change your mind.

Only change your mind if the confidence interval doesn’t overlap with your null hypothesis.

While a confidence interval’s technical meaning is a little bit weird (I’ll tell you all about it in a future post, it’s definitely not simple like the credible interval we met earlier, and wishing does not make it so), it also has two useful properties which analysts find helpful in describing their data:

(1) the best guess is always in there and

(2) it’s narrower when there’s more data. Beware that both it and the p-value weren’t designed to be nice to talk about, so don’t expect pithy definitions. They’re just ways to summarise test results. (If you took a class and found the definitions impossible to remember, that’s why. On behalf of statistics: it’s not you, it’s me.)

Don’t waste your time rigorously answering the wrong question. Apply statistics intelligently (and only where needed).

What’s a Type III error? It’s kind of a statistics joke: it refers to correctly rejecting the wrong null hypothesis. In other words, using all the right math to answer the wrong question.

A cure for asking and answering the wrong question can be found in Decision Intelligence, the new discipline that looks at applying data science to solving business problems and making decisions well. By mastering decision intelligence, you’ll build up your immunity to Type III error and useless analytics.

Thanks for reading..!!!

--

--