Probability vs. Statistics

Why do we assume they are the same thing?

There is a common misconception to which we all fall prey quite often. If we are looking for one to blame you might point fingers at early education professionals, but also our own internal processing system hasn’t seemed to help us make the distinction. Here is the thing, probability and statistics are not the same thing. Sorry to break that news to you, but it’s true.

prob·a·bil·i·ty

noun

the extent to which something is probable; the likelihood of something happening or being the case.

“the rain will make the probability of their arrival even greater”

sta·tis·tics

noun

the practice or science of collecting and analyzing numerical data in large quantities, especially for the purpose of inferring proportions in a whole from those in a representative sample.

“recent statistics show an increase in allergic reactions”


Ok so first off, of course they aren’t the same thing, if they meant the same thing then they would be the same word. That’s our first clue. More importantly though, is how they are incorporated one with another.

Let me give you an example first, “all jacuzzi’s are hot tubs, but not all hot tubs are jacuzzis.” You may have heard this before, another is “all squares are rectangles, but not all rectangles are squares.”

Now

I wouldn’t go as far to say the exact same format applies to statistics and probability, but I’d change it just a bit like so: “statistics often uses probability, but the use of probability isn’t a means of statistical inference.”

So when we were youngsters in school learning about the likelihood of drawing a red marble out of a bag based on the quantity of blue and red marbles, we were slightly mislead to believe that was a statistical method, but it’s not. Really that is just a simple way to guess. It’s a more mathematical way to try and think about something like fate, or in better terms probability. It’s just what it says, how likely is XX action to occur when I do YY.

Statistics however, is a far broader, long reaching extent to predict and determine results. We don’t use probability to determine polls or analyze a census, that’s the job of statistics. We use probability as an extent of statistics but that doesn’t make it stats.

Things like regression, and inferential analysis come from the use of proper statistics where we can determine and infer something based on a population or sample evaluation of data.

So

Probability: Measure of the expectation that an event will occur or a statement is true.

Statistics: Study of the collection, organization, analysis, interpretation, and presentation of data.

You use statistics to gather and analyze a data set, but probability to measure the likelihood that you got it right perhaps.

So your confidence interval might be 90% but that’s not the same thing as saying the probability of it happening is 90%. If your CI is 90% then really you are saying that you believe the result is 90% likely to lie within the upper and lower bands you’ve set. But, if the probability you are assigning to a thing is 90% then that means if you draw 10 marbles, only 1 of them will be blue while the other 9 will come out red.

If you want to dive way deeper into this type of idea, read “How To Measure Anything” and “How Not To Be Wrong.”

I’m not saying this is some cardinal sin that everyone is committing, BUT I bet you’ve never thought about it before. Hope you enjoyed a bit of abstract thinking, but getting it right is important, and will make me happier.

Show your support

Clapping shows how much you appreciated Chase Cottle’s story.