Probability Theory: 🎲 Discrete Probability Distribution Functions 🎲

Ashish Arora
8 min readAug 4, 2023

In our series of statistics, we last discussed the continuous probability distribution. Today we will take you on the journey of the Discrete Probability Distribution function.

In the case of a continuous probability distribution function, we utilize the Probability Density Function (PDF) because continuous variables can take an infinite number of values, making the probability at any specific point infinitesimally small. Hence, we relied on the PDF to estimate the probability within a range.

Discrete Probability Distribution

In the case of a Discrete Probability Distribution function, we use Probability Mass Function.

The Probability Mass function is nothing but the probability for the discrete random variable. The PMF assigns probabilities to each possible outcome of the random variable.

PMF(X = x) = P(X = x)

where PMF(X = x) represents the probability of the random variable X taking the specific value x, and P(X = x) is the probability of the event X = x occurring.

Different types of discrete probability distributions serve specific purposes in solving various problems.

  1. Some distributions focus on determining the success rate among a fixed number of trials. (Binomial distributions)
  2. While others are concerned with the number of failure trials that will take place to achieve a specific number of successes. (Non-Binomial Distribution)
  3. Other distributions are designed to calculate the number of failure trials needed to go through before the first success in a series of independent trials. (Geometric Distribution)
  4. Additionally, certain distributions are designed to express the number of successful events occurring within a specific time interval or spatial region given the success rate. (Poisson Distibution)

Bernoulli Probability Distribution

The Bernoulli distribution is a discrete probability distribution that models a random variable that can take only two possible outcomes, typically labeled as “success” and “failure” or 1 and 0.

The probability mass function (PMF) of the Bernoulli distribution is defined as:

Binomial Probability Distribution

The binomial distribution is a generalization of the Bernoulli distribution and models the number of successes in a fixed number of independent Bernoulli trials.

It is used to calculate the probability of getting a specific number of successes in a given number of trials, where each trial has two possible outcomes, or in simple terms, it tries to find the success rate given the number of trials.

The key characteristics of a binomial distribution are as follows:

  1. Bernoulli Trials: The binomial distribution has a sequence of multiple independent Bernoulli trials. Each Bernoulli trial is independent and has only two possible outcomes, typically labeled as success (S) and failure (F). The probability of success, denoted as p, remains the same for each trial.
  2. Fixed Number of Trials

Hypothesis tests based on the binomial distribution evaluate whether a certain proportion or success rate differs significantly from a hypothesized value. The binomial distribution is the basis for the popular binomial test of statistical significance.

PMF for Binomial Distribution Function:

n=number of trials, k= number of successes, p = success

Statistics for Binomial Distribution Function

Shape of Binomial distribution:

The shape of a binomial distribution depends on the values of n and p. As the number of trials increases, the distribution becomes more symmetric and bell-shaped, resembling a normal distribution. However, for small sample sizes or extreme values of p, the distribution may be skewed.

Hence, it is often said that as the number of trials increased, we can use a standard normal distribution for finding the probability or for any testing.

Remember: Binomial distribution is used when there are only two categories such as success and failure. However, if the category is more than that and you are looking for the success rate of each category, then multinomial distribution will be used.

Example:

There is a bulb manufacturing factory. For each production batch, they manufacture 100,000 bulbs. From long-term experience, the manufacturer knows that 1% of the bulbs are defective from a batch of 100,000 bulbs. A random sample of 200 bulbs is taken from a batch. Let X be the number of defective bulbs.

  1. What is the probability that 2 bulbs are defective?
  2. What is the probability at least 1 bulb is defective?

Small Request:

Kindly follow me, Ashish Arora, and give a clap to this if you are finding this better! I dedicate an extensive effort to curating informative and well-researched materials just for you. Your support motivates me and helps me reach more people. 🙏

Negative Binomial Distribution

The negative binomial distribution talks about the distribution of the number of (failed and successful) trials needed to get the defined number of successes.

The negative binomial distribution is almost the same as a binomial distribution with one difference. In a binomial distribution, we have a fixed number of trials, but in a negative binomial distribution, we have a fixed number of successes, the number of trials can be any.

Some people think them to be opposite because of this property, but they are rather completely different.

Both are concerned about success, but one looks that how many successes it achieves in a given independent number of trials, and the other (negative binomial distribution) looks at the number of trials it has to do in order to get a certain number of successes.

n= number of trials before the success, r= number of success, probability of success

For example:

Suppose you are flipping a biased coin with a 0.6 probability of landing heads (success) and 0.4 probability of landing tails (failure). You continue flipping the coin until you get 5 tails (r = 5). We want to find the probability of getting exactly 10 heads (k = 10) before the 5th tail occurs.

P(X = 10) = (10 + 5–1) C (5–1) * 0.6⁵ * (1–0.6)¹⁰ ≈ 0.00814

So, the probability of getting exactly 10 heads before the 5th tail occurs is approximately 0.00814, or about 0.81%.

Geometric Distribution

The geometric distribution describes the probability distribution of the number of trials needed to achieve the first success in a sequence of independent Bernoulli trials, each with the same probability of success denoted as ‘p.’

In other words, it calculates the number of failures before the first success occurs.

The memory lessness property means that the probability of achieving success on the next trial does not depend on the outcome of any previous trials. Each trial is independent and unaffected by prior results. Hence as the trials increase, the probability of success decreases.

Probability Mass Function (PMF): The probability mass function of the geometric distribution is given by:

P(X = k) = (1 - p)^(k-1) * p

Where:

  • P(X = k) is the probability of needing exactly k trials to achieve the first success.
  • 1 - p is the probability of failure in a single trial.
  • k is the number of trials required (k = 1, 2, 3, ...).

Example: Suppose we have an unbiased coin, and we want to calculate the probability of getting heads on the third flip (k = 3). As the coin is unbiased, the probability of getting heads (success) on any single flip is p = 0.5.

Using the geometric distribution formula:

P(X = 3) = (1–0.5)^(3–1) * 0.5 P(X = 3) = 0.25 * 0.5 P(X = 3) = 0.125

The probability of getting heads on the third flip is 0.125 or 12.5%.

Poisson Distribution

The Poisson distribution is a discrete probability distribution used to model the number of rare events that occur in a fixed time or space interval. The Poisson distribution is characterized by lambda, λ, the mean number of occurrences in the interval.

Characteristics of Poisson distribution:

  1. The events are independent.
  2. The average rate of events is constant for every interval.
  3. The distribution is centered around λ, which is also its mean and variance.
  4. The Poisson distribution is positively skewed when λ is small, but it becomes more symmetrical as λ increases.
  5. With an increase in λ, distribution also shifts to the left and vice versa.

Example: Suppose you manage a call center, and on average, you receive 20 customer service calls per hour. You want to use the Poisson distribution to model the number of calls you can expect to receive in a specific time frame, such as during a 10-minute interval.

Step 1: Convert the average rate from calls per hour to calls per 10 minutes.

Average rate per 10 minutes = 20 calls per hour / 6 = 3.33 calls per 10 minutes

Step 2: Apply the Poisson distribution formula to find the probability of receiving exactly 4 calls in a 10-minute interval (k = 4).

P(X = 4) = (e^(-3.33) * 3.33⁴) / 4! P(X = 4) ≈ 0.1792

So, the probability of receiving exactly 4 calls in a 10-minute interval is approximately 0.1792 or 17.92%.

— — — — — — — — — — — — —

So, this is all from this post, in the next post we will discuss Sampling Distribution.

Happy learning!

Feel free to find me on LinkedIn, Github.

--

--

Ashish Arora

An aspirant and passionate about fair and explainable AI and Data Science since 2020. I hold a postgraduate diploma degree in Data Science from III-T Bangalore.