Key Statistics Terms #19: Poisson Distribution

Rajiv Gopinath
7 min readSep 22, 2024

--

A Poisson distribution is a discrete probability distribution. It gives the probability of an event happening a certain number of times (k) within a given interval of time or space.

The Poisson distribution has only one parameter, λ (lambda), which is the mean number of events. The graph below shows examples of Poisson distributions with different values of λ.

Table of contents:

1. What is a Poisson distribution?

2. Applications of Poisson Distribution:

3. Significance of Poisson Distribution:

4. Implementation of Poisson Distribution in python

5. Conclusion

What is a Poisson distribution?

A Poisson distribution is a discrete probability distribution, meaning that it gives the probability of a discrete (i.e., countable) outcome. For Poisson distributions, the discrete outcome is the number of times an event occurs, represented by k.

You can use a Poisson distribution to predict or explain the number of events occurring within a given interval of time or space. “Events” could be anything from disease cases to customer purchases to meteor strikes. The interval can be any specific amount of time or space, such as 10 days or 5 square inches.

You can use a Poisson distribution if:

  1. Individual events happen at random and independently. That is, the probability of one event doesn’t affect the probability of another event.
  2. You know the mean number of events occurring within a given interval of time or space. This number is called λ (lambda), and it is assumed to be constant.

When events follow a Poisson distribution, λ is the only thing you need to know to calculate the probability of an event occurring a certain number of times.
examples of Poisson distributions

Poisson distribution could be used to explain or predict:

  • Text messages per hour
  • Male grizzly bears per hectare
  • Machine malfunctions per year
  • Website visitors per month
  • Influenza cases per year

Mean and variance of a Poisson distribution

The Poisson distribution has only one parameter, called λ.

  • The mean of a Poisson distribution is λ.
  • The variance of a Poisson distribution is also λ.

Poisson distribution formula

The probability mass function of the Poisson distribution is:

Example: Applying the Poisson distribution Formula average of 0.61 soldiers died by horse kicks per year in each Prussian army corps. You want to calculate the probability that exactly two soldiers died in the VII Army Corps in 1898, assuming that the number of horse kick deaths per year follows a Poisson distribution.

Calculation

The specific army corps (VII Army Corps) and year (1898) don’t matter because the probability is constant.

The probability that exactly two soldiers died in the VII Army Corps in 1898 is 0.101.

Applications of Poisson Distribution:

The Poisson distribution is widely used in various fields to model the probability of a given number of events happening in a fixed interval of time or space, under the assumption that the events occur independently and at a constant average rate. Below are some common applications:

  1. Telecommunications and Networking:

· Call Arrivals: Poisson distribution is often used to model the number of calls received at a call center within a specific period of time.

· Network Traffic: It is used to model the arrival of data packets in a network and helps optimize bandwidth usage and network resource allocation.

2. Healthcare and Epidemiology:

· Disease Incidences: In epidemiology, Poisson distribution is used to model the number of new disease cases in a region over a certain period. It helps track the spread and outbreak of diseases.

· Hospital Emergencies: The distribution can model the number of patients arriving at an emergency room within an hour or a day.

3. Quality Control in Manufacturing:

· Defects in Products: Poisson distribution is useful for modeling the number of defects found in a given length of material or batch of items, helping businesses maintain quality standards and predict faulty products.

4. Traffic and Queuing Theory:

· Vehicle Arrivals at Toll Booths: The distribution models the number of vehicles arriving at a toll booth or intersection in a fixed time period.

· Customer Service: It helps predict the number of customers arriving at service centers, enabling proper staffing and resource planning.

5. Insurance and Risk Management:

· Claim Occurrences: Insurance companies use the Poisson distribution to model the frequency of claim occurrences in a specific time period, aiding in pricing policies and risk assessment.

Significance of Poisson Distribution:

The significance of the Poisson distribution lies in its ability to model rare events and occurrences over fixed intervals, offering practical insights in several fields. Here are key points highlighting its significance:

  1. Rare Event Modeling:

The Poisson distribution is ideal for modeling rare or infrequent events that happen over a fixed interval. Examples include natural disasters, accidents, or system failures, helping businesses and governments plan for such contingencies.

2. Time or Space Intervals:

Poisson distribution is valuable when events occur randomly and independently in a fixed period of time or area. Its simplicity and versatility make it useful across different industries, such as telecommunications, insurance, healthcare, and traffic systems.

3. Independence of Events:

One of the core assumptions of the Poisson distribution is that events are independent of each other, making it significant for processes where each occurrence is random and not influenced by previous events, such as incoming calls to a service centre or customer arrivals.

4. Flexibility for Discrete Data:

It is particularly useful for discrete data, where the occurrences of events can only be counted in whole numbers. This makes it effective for scenarios like counting the number of machine breakdowns, defects, or arrivals.

Implementation of Poisson Distribution in python

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import poisson

# Define the parameter (lambda, or average rate of occurrence)
lambda_rate = 4 # Average number of events in an interval

# Generate a range of values (k) for which we will calculate the probability
k_values = np.arange(0, 15) # Values of k (number of events)

# Calculate the probability mass function (PMF) for each k
pmf_values = poisson.pmf(k_values, mu=lambda_rate)

# Plot the Poisson distribution
plt.bar(k_values, pmf_values, color='blue', alpha=0.7)
plt.title(f'Poisson Distribution (lambda = {lambda_rate})')
plt.xlabel('Number of events (k)')
plt.ylabel('Probability')
plt.grid(True)
## Explanation:

* Lambda (λ): This is the average number of events in a fixed interval
(e.g., time, space). In this example, we set lambda_rate = 4.

* k_values: These are the discrete values for which we want to calculate
the Poisson probabilities. In this case, we are considering values of
𝑘


k from 0 to 14.

* PMF: The poisson.pmf function computes the probability mass
function, which gives us the probability of observing
exactly k events when the average rate of occurrence is λ.

* Graph Explanation:
The bar chart represents the probability of different numbers of events
(k) occurring based on a Poisson distribution with a given λ (lambda rate).

The height of each bar corresponds to the probability of observing that
number of events.

Simulating Poisson-Distributed Data:

# Simulate 1000 random variables from a Poisson distribution
poisson_data = poisson.rvs(mu=lambda_rate, size=1000)

# Plot histogram of the simulated data
plt.hist(poisson_data, bins=15, density=True, alpha=0.6, color='g')
plt.title(f'Histogram of Poisson-Distributed Data (lambda = {lambda_rate})')
plt.xlabel('Number of events (k)')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()

Google Colab Code

Conclusion:

In conclusion, the Poisson distribution is a robust tool for modelling the probability of discrete events occurring over a fixed interval of time or space. Its applications span various domains, making it a valuable distribution for statistical inference. Here are key points summarizing its conclusion:

  1. Ideal for Rare and Random Events:

The Poisson distribution is well-suited for situations where events occur rarely and randomly. It is frequently used in contexts where the probability of a single event happening is small but the opportunity for its occurrence is large, such as disease outbreaks or accidents.

  1. Constant Average Rate:

The Poisson distribution assumes a constant average rate of occurrences, making it ideal for processes like customer arrivals at a service center or defects in production lines. This helps organizations manage resources and anticipate demand more effectively.

  1. Independence and Simplicity:

A major strength of the Poisson distribution is its simplicity and the assumption of independence between events. This enables analysts and decision-makers to model processes where the occurrence of one event does not affect the probability of another event occurring.

  1. Predictive Power:

By allowing for the prediction of event occurrences over time, the Poisson distribution provides a powerful tool for risk management, operational planning, and decision-making in sectors like healthcare, manufacturing, and transportation.

In summary, the Poisson distribution offers a simple yet effective means of modelling and analyzing random, discrete events, providing valuable insights for optimizing processes, managing risk, and making data-driven decisions. Researchers and professionals should consider the assumptions of the Poisson distribution and apply it appropriately based on the nature of the data and events being analyzed.

--

--