Knowledge Distribution on Probability Distributions

You know it, you love it: here comes the normal distribution

6 min readDec 8, 2023

Probability distributions can be used to help visualize and describe a possible value that a random variable can take. They are used by statisticians, commonly in hypothesis testing and some machine learning models use them to make assumptions about the probability of the dataset in order to make new predictions.

In this article we’ll cover some Probability Density Functions for continuous data and Probability Mass Functions for discrete data.

bell representing the bell curve — A statisticians’ favorite

TL;DR

Normal distribution is for continuous data and used to see ranges of your data in relation to the mean
Poisson distribution is for discrete data and used to gain the probability of an event occurring within a time period
Binomial distribution is for discrete data and it allows you to evaluate a success or failure state for a number of trials
Bernoulli distribution is for discrete data and is for assessing the success or failure state for one trial

Data Types

Let’s first do a quick review of the data types and some possible machine learning tasks you can do with them:

Probability Density Function

So, in the chart above I mention that continuous data is measurable. If you take the measurement between two points, there could be an infinite range of numbers between those two points. You maybe asking ‘Wait, what does that mean?’ Let me explain…

For example, if you look at a ruler, you’ll see numbers 1 and 2, and marks that indicate 1.25, 1.5, 1.75, etc. However, between those markings are even more numbers not indicated on the ruler. So we know that between points 1.25 and 1.5, exists the numbers 1.201, 1.353465, 1.455, 1.4901, etc. You can say that between two points, there are an infinite set of numbers. Okay we are almost done, just hang on a little longer!

robot holding a ruler in their hand with a galaxy behind it — who knew rulers could be so limitless

So, now we are going to weave in probability into the example above. Let’s say, starting at 0'’ on the ruler, you close your eyes and you want to know the exact probability of sliding your finger down the ruler, stopping on a notch, and that notch being exactly 3'’. To land on exactly 3'’ would mean we need an infinite number of digits of precision in our measurement, making the probability exactly 0. Whew! Okay, so now what?

So, this brings us to continuous random variables, which have a probability of 0 when producing any specific outcome. To get a probability we have to change our framing, instead of thinking of probabilities as an exact number, we have to think in probabilities as ranges or intervals. That’s it, the hard part is over!

abstract drawing of person with lots of colors coming from center of it — what is the probability of understanding probability density functions?

Normal distribution

This is a type of data distribution for continuous variables. This gives the probability of a data point falling within a given range and this is a way of visualizing your continuous data to see where your data is located in reference to the mean.

The center, µ, is the mean of your data and on either side are the standard deviations, σ. So for example, -2σ is two standard deviations left of the mean.

you know it, you love it, it’s the normal distribution source

Probability Mass Function

So, now we move on to something more straight forward. Now we can dive into discrete random variables and just as a reminder, discrete variables are countable numbers.

robot holding up fingers — if you can count it on your hand then its a discrete variable

So let’s say you have a recommender system and someone can either like the recommendation by clicking a thumbs up icon or dislike the recommendation by clicking the thumbs down icon.

At this time, you want to know how many times someone liked a recommendation, when three different recommendations were given to them and the probability of each outcome. We can break this down in the table below:

table of possible likes and dislilikes when someone is recommended three things — probability of likes when three things are recommended

Now from this, we can create our probability mass function

probability mass function of likes when someone is recommended three things

Okay, so to wrap this up, we can visualize discrete random variables with a probability mass function, which allows us to visualize the probability of an outcome. Now on to the different data distributions for discrete data.

Poisson Distribution

This is a type of distribution for discrete variables. This gives you a probability of how often an event is likely to occur within a period of time. You can use this to visualize a series of events that end in success or failure, for example the number of houses sold in a day. Below is an example of what a poisson distribution could look like for a dataset.

Binomial

This is a type of distribution for discrete variables. Binomial only counts for two states, either a success or failure, given the number of trials. This could show up in machine learning where if the performance of a binary classification model is being analyzed over many trials.

This distribution represents the probability of successes within a trial, given the success probability for each trial. For example, this could be used to visualize results from a two party election. Below is an example of what a Binomial distribution could look like for a dataset

Bernoulli

This is a type of distribution for discrete variable and is a special case of Binomial distribution. This distribution represents the success or failure of a single trial. A common example of this in machine learning is for binary classification models.

Bernoulli represents probabilities within a single trial, whereas Binomial represents probabilities within multiple trials. For example a single toss of the coin would be a Bernoulli distribution, but if you toss the coin multiple times and record the number of heads, this is now a Binomial distribution. Below is an example of what a Bernoulli distribution could look like for a dataset:

Bernoulli distribution — bernoulli distribution source

Final Word

To wrap this all up, I hope I was able to give insights on to why the probability distributions have the behavior they have, why they look the way they look and when to use them.

two robots holding hands and jumping — PMF and PDF

Want more? Check out this article on feature engineering and this one on data imputation.