A Beginner’s Guide to Understanding Discrete Probability Distributions

Part I: Expectation, variance, Binary and Bernoulli distributions

Published in

Analytics Vidhya

8 min readOct 8, 2019

“It is probably better to realize that the probability concept is in a sense subjective, that it is always based on uncertain knowledge, and that its quantitative evaluation is subject to change as we obtain more information.”
~Richard P. Feynman

The concept of probability has always seemed mysteriously fascinating to me. The idea of quantifying uncertainty of this random universe is in itself audacious! Anyway, I recently started to explore probability and it has captured my curiosity.

So, today, I’m going to discuss Discrete Probability Distributions. So, what exactly does the term probability distribution mean?

To answer this, let me give you an example;

Since I’ve recently moved to Dublin from Mumbai, let me give you an example that I have experienced. I’m getting some of my stuff couriered from India; just some of my clothes that I couldn’t fit into my checked-in luggage, and some packets of Maggi instant noodles. So, I contact an international courier service which gets stuff delivered across the globe. However, I’m concerned about whether my stuff will be delivered to me as is, without any damage during the transit, especially the noodle packets. No one likes their noodles crushed to pulp. Now, I call up the courier guy again and let him know my concerns.

The courier guy says to me that there’s nothing to worry about and that only 5 out of 50 packets may get damaged during transit handling.

So, what does this mean? Should I go ahead with the deal and order my stuff through this service? Or should I look for some other courier service that will promise me a better safety of my Maggi packets?

Well, this is where probability comes in.

So, imagine I order around 20 boxes of Maggi and each box has 50 packets. The boxes gets delivered to me and I start inspecting them. Suppose, I open the first box and pick out 1 packet at random. So, what is that chance that the packet I pick is the damaged one?

The courier guy had told me that there may be at the most 5 packets that are damaged during transit. So, if I pick up a random packet from the box, there’s a 5/50 i.e. 1/10 chance that it’s a damaged packet.

In layman’s terms, it means that there is a 1 in 10 chance that the packet you pick, is the damaged packet. This is probability.

Therefore, the probability that you pick up a damaged Maggi packet in a box is 0.1 or 1/10.

i.e. there’s a 10% chance of you picking up a damaged packet.

Probability Distributions:

Moving on, I hope the above example gave you a fair idea about what probability is. Since this topic is about Discrete Probability Distributions let’s delve into it.

Probability distributions are of two types:

Discrete
Continuous

Now, in this article, I’ll be exploring only the Discrete Probability Distributions.(a few popular models.)

Before talking about Probability Distribution Models, let me try to explain to you what exactly is a probability distribution.

According to Wikipedia,

In probability theory and statistics, a probability distribution is a mathematical function that provides the probabilities of occurrence of different possible outcomes in an experiment. In more technical terms, the probability distribution is a description of a random phenomenon in terms of the probabilities of events.

Well, this is how Wiki describes a probability distribution. Quite lucid and brief, right?

I’ll try to explain this using the classic dice example-

Suppose you throw a dice. Now, consider the following table-

In the above table, X denotes the number on the dice and P(X) denotes the probability of that number appearing on the dice, if it’s thrown once.

X can be any discrete random variable and P(X) is the associated probability of it.

So basically, when it comes to probability distribution there are just two things that matter; the values of the random variable can take and the associated probabilities that each of these values occur.

To define it in more technical terms, if X is any discrete random variable and each value of X has an associated probability p(x), then p(x) is called the probability distribution if the following conditions are satisfied:

p(x) should be greater than 0 for all values of x
The sum of all of the probabilities will be equal 1 (or 100%). i.e ∑p(x)=1

p(x) is referred to as probability function or probability mass function.

Now consider the earlier example of throwing a dice. From the table, we can see that p(x) ≥ 0 for all values of x and summation of all p(x) is equal to 1.

This is probability distribution in a nutshell.

Just one last thing before I move on to discussing some of the discrete probability distribution models.

Expectation or Expected Value:

The expectation of a discrete random variable x with probability function p(x) is given by -

E(X) = ∑ xp(x)

So, expected value is basically the product of the value of a discrete random variable and the associated probability of the value.

Or to elaborate in a more simple manner, it’s the value you will get on average, on a number of trials.

Using the above dice experiment again, suppose that you roll two dice. Now consider the following diagram:

**Picture Source:** **https://en.wikipedia.org/wiki/Probability_distribution**

It is quite evident from the above diagram that the most likely value you can get is 7.

So, if someone asks you what’s the “expected” outcome out of this experiment, it’ll be 7.

P.S. Just like the summation of all p(x) should be 1 or 100%, the area under the distribution curve has to be 1.

Moving on, there’s one final concept I need to shed some light on; Variance.

Variance:

Also denoted by Var(x), it is the average of the squared standard deviations.

Var(X) = E[(X−μ)²] = ∑(x-μ)²p(x)

Standard deviation generally describes the spread or distribution of values about the mean and is the square root of variance.

A low standard deviation signifies that the values are all closer to the mean and a high standard deviation means that the values are spread out from the mean.

Okay. Now that we have covered all the required necessary aspects, let’s move on and discuss some discrete probability distribution models!

1. Bernoulli Model

This model is used for binary classification and used in random experiments where the outcome is either 0 or 1. i.e either ‘Yes ‘or ‘No’, or ‘success’ or ‘failure’; only two outcomes are possible.

For e.g. if you toss a coin, it will either be heads or tails.

Its Probability Mass Function, Expectation and Variance is given by the following formula-

The thing about Bernoulli Model is that every trial has to have the same probabilities, throughout and every trial must be independent of each other. The trials should not be dependent on the outcomes of other trials. Like during coin tosses. Or perhaps rolling two dice.

In case the trial events are connected, then Bernoulli model won’t work since the probabilities will change during the trials.

The Bernoulli model is only for n=1, where n is the number of trials.

For n >1, we have the Binomial model. It’s related to Bernoulli model in a way.

Example:

Suppose we flip a coin 6 times and get the following outcomes:

HTTHTT

viz. (011011)

How to calculate the probability mass function of the above data-set?

Now, mathematically we can solve for p(x=1) which is 4/6 = 0.67 and we get the probability.

From the formula, we can determine the P.M.F.(Probability Mass Function) P(X=j) as (2/3)^j * (1/3)^(1-j) where j is {0,1}

Now, similarly we can calculate the expectation as-

E = p = 2/3 = 0.67

And variance by-

Var(x|p) = p(1-p) = 2/3(1–2/3) = 2/9 = 0.22

Implementation in R:

There’s a Bernoulli distribution function in R. However, it’s in the Rlab package. So, you need to first install that package.

We can do the above calculations in R using the Bernoulli Distribution Function

Please refer the below snapshot-

Now let’s see what’s a Binomial distribution model.

2. Binomial Model

As I said earlier, the Binomial model is closely related to Bernoulli model.

A Binomial Distribution model will tell you the likelihood of getting a ‘success’ in a series of independent trails or events.

Suppose we toss a coin 50 times. Now we’re repeating the same event 50 times in a row. i.e. n=50. with the probability 0.5 and 0.5.

In the aforementioned coin toss experiment., if getting a head is ‘success’ then we could expect 25 successes, as 25 is the mean of the distribution.

Binomial is basically multiple independent Bernoulli events.

In a Binomial distribution model, Probability Mass Function, Expectation and Variance is given by the following formula-

The Binomial model is very similar to the Bernoulli model. It can be said that the Bernoulli model is just a simple Binomial model with n=1, where n is the number of trials.

Example:

Suppose we flip a coin 20 times. What is the probability that we get exactly 9 heads?

Now, we have here n=20

P(x=j=9) since we need 9 heads.

The probability p will be 0.5.

By using the mathematical formula, we can calculate it as-

P(x=j=9) = 20C9 * p⁹*(1-p)¹¹ = 167960*0.5⁹*0.5¹¹ = 0.160

Similarly we can calculate the expectation as-

E = n*p = 20*0.5 = 10

And finally variance as-

Var(x|p) = 20*0.5(0.5) = 5

Now, we can calculate the same for P(x≥9) as 1- P(x≤8) where P(x≤8) can be calculated as P(0)+P(1)+…+P(8)

Implementation in R:

The best thing about R is that there’s a package for everything.

Anyway, there’s an inbuilt Binomial distribution function in R called ‘dbinom’

We can solve the above problem in R as follows-

Well, that’s it for today! I’ll be covering more distribution models in this series. Probably in the upcoming blog posts.

Also, please do let me know if you found this article resourceful and please do share it with your friends/peers/colleagues.

If not, I’m always up for constructive feedback on how I could improve further.

Feel free to reach out to me on LinkedIn or perhaps on Twitter (I mostly RT jokes, and doggo gifs)

https://www.skepticalthoughts.com/ <- That’s my personal blog, in case you’re bored with this mathematics and probability and need something refreshing to read. I mostly write short stories, satires and sometimes poetry. Do check it out.

Until next time, adios!

— — — — — — — — — — — —

A Beginner’s Guide to Understanding Discrete Probability Distributions — Part II