Discrete Probability Distributions with Python

Sneha Bajaj
5 min readOct 17, 2023

--

Understanding Bernoulli and Binomial Distributions (Probability Mass Functions) with simulation using Python

In case you are not familiar with the concept of probability distribution or would like a refresher in the types of probability distributions, you may want to start with the following 5-min read:

Index
Bernoulli Distribution
Binomial Distribution
PMF v/s CDF
Summary

Bernoulli and Binomial Distributions are the two most commonly found probability mass functions (PMF).

Bernoulli Distribution

It is a discrete probability distribution which represents a single trial of a random experiment having only two outcomes — success and failure.

A random variable, X, with a Bernoulli distribution can take value 1 with the probability of success p, and the value 0 with the probability of failure 1-p. Therefore, p (probability of success) is the only parameter required to plot a Bernoulli Distribution

Lets assume that the probability of clearing an interview is 70%. We can plot a Bernoulli distribution using Python:

import matplotlib.pyplot as plt

# Probability of success (p)
p = 0.7
outcomes = [0,1]
probability = [1-p,p]

#Plotting the distribution
plt.bar(outcomes, probability)
plt.xticks([0,1])
plt.xlabel('Outcome')
plt.ylabel('Probability')
plt.title('Bernoulli Distribution with 70% probability of Success')
plt.show()

The probability mass function (PMF) of a Bernoulli distribution is given by:

Binomial Distribution

Bernoulli distribution represents experiments that have two outcomes but only a single trail. What if we have multiple trials? Say we give not one but many interviews. This is where Binomial Distribution, an extension of Bernoulli distribution, is used.

Binomial distributions must meet the following criteria:

  • There are only two possible outcomes in a trial- either success or failure.
  • The probability of success is the same for all trials.
  • The number of trials is fixed, a total number of n identical trials.
  • Each trial is independent, none of the trials have an effect on the probability of the next trial.

So p (probability of success) and n (number of trials) are the two parameters of a binomial distribution.

Simulation of a Binomial Distribution using Python:

Giving 15 interviews with 50% chance of success in each interview is a random experiment with n=15 and p=0.5. The number of interviews cleared out of 15 is the random variable (outcome) here.

In the following code, we simulate this random experiment 1000 times and plot probabilities of all possible outcomes using the sample generated.

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

#Simulate the random experiment 1000 times
sample = np.random.binomial(n = 15, p = 0.5, size = 1000)

#Plot the probabilities of each outcome using sample data
sns.countplot(x=sample, stat="percent")
plt.xlabel('Number of Successes')
plt.ylabel('Probability Percentage')
plt.title('Binomial Distribution for Number of Successes in 15 Trials')
plt.show()

Every time you run this code, there will be a slight variation in the probabilities but they will be pretty close to the theoretical probabilities given by:

  • n : total number of trials
  • p : probability of success in each trial
  • k : number of successes
  • P(X=k) : Probability of k successes out of n trials (X, the random variable is k)

PMF v/s CDF

While a binomial probability mass function (pmf) gives the probability of k successes in n trials, a cumulative distribution function (cdf) gives the probability up to k successes in n trials. It is simply the sum of all the probabilities : P(X=0) + P(X=1) + P(X=2) …. P(X=k-1) + P(X=k)

We can use functions like binom.pmf() and binom.cdf() from the scipy.stats package in Python to calculate the individual or cumulative probabilities, given the value of n, p and k:

from scipy.stats import binom

# Calculate P(X=5): Probability of clearing exactly 5 interviews
a = binom.pmf(n=15, p=0.5, k=5)
print(f"Probability of clearing exactly 5 interviews: {a:.2f}")

# Calculate P(X<=5): Probability of clearing 5 or less interviews
b = binom.cdf(n=15, p=0.5, k=5)
print(f"Probability of clearing less than or equal to 5 interviews: {b:.2f}")

# Calculate P(X>5): Probability of clearing more than 5 interviews
c = 1 - binom.cdf(n=15, p=0.5, k=5)
print(f"Probability of clearing more than 5 interviews: {c:.2f}")

Summary

In this article, we delved into types of discrete probability distributions such as Bernoulli and Binomial Distribution. These distributions only apply to a random experiment with 2 outcomes (success or failure)

While Bernoulli distribution represents the probability of success in a single trial, Binomial distribution represents the probability of k successes in n trials.

We learnt to simulate both Bernoulli and Binomial distributions using Python.

Finally, we saw packages and methods in Python, binom.pdf() and binom.cdf() from scipy.stats, which can be used to calculate probability of individual/cumulative outcomes in a Binomial Distribution.

If you found this helpful, do add some claps and follow me for more on statistics and data science!

--

--

Sneha Bajaj

Passionate about using data analysis and machine learning to solve business problems.