Binomial And Poisson Distribution

akhil anand
Analytics Vidhya
Published in
4 min readSep 26, 2020

Binomial Distribution

What is Binomial Distribution ?

It is a discrete distribution and describes success or failure of an event. e.g:- In an examination student can either pass or fail , if a coin is tossed it gives either head or tail. In other word Binomial Distribution deals with only two possible outcomes.

Difference between Binomial Distribution and normal Distribution ?

Binomial Distribution is Discrete whereas Normal Distribution is continious in nature but for a large datapoints Binomial Distribution predominantly behaves like Normal Distribution.

pictorial representation

import numpy as np   #Binomial Distribution plot
import matplotlib.pyplot as plt
import seaborn as sns
binomial_data=np.random.binomial(n=10,p=0.3,size=1000)
sns.distplot(binomial_data,kde=True,hist=True,color="green")

comparison between binomial and normal distriburtion

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(np.random.binomial(n=100,p=0.3,size=100),hist=False,kde=True,color="green",label="Binomial")
sns.distplot(np.random.normal(loc=30,scale=5,size=100),hist=False,kde=True,color="red",label="Normal")

Mathematical formulation and Parameters of Binomial Distribution(n,p,size,x)

source:-onlinemathlearning

Parameters P,Q,n,x can be defined in next subsection with the help of an example.

Q 1.> A company manufactures LED bulbs with a faulty rate of 30%. If I randomly select 6 chosen LEDs, what is the probability of having 2 faulty LEDs in my sample? Calculate the average value of this process. Also evaluate the standard deviation associated with it ?

Defining parameters;

P=0.3     ;   Q=1-P=0.7
n=total number of trials=6
k=number of trail that will be successed=2
size=Total number of random samples =1000

Mathematical Calculation :-

why we take random samples(z) ?

When we analyse data as a M.L engineer we must ensure how the uncertanity introduced by random samples affects our datasets .We also try to evaluate how data would be affected by random error.

Python implementation and plotting

from scipy.stats import binom
import matplotlib.pyplot as plt
import seaborn as sns
binomial_data=binom.rvs(n=6,p=0.3,size=1000)
sns.distplot(binomial_data,hist=True,kde=True,color="red")
#Probability of getting faulty out of 6 trials
from scipy.stats import binom
import matplotlib.pyplot as plt
import seaborn as sns
probab=binom.pmf(k=2,n=6,p=0.3)
print("Probability will be :",probab)
cdf=binom.cdf(k=2,n=6,p=0.3)
print("CDF will be :",cdf)
[out]>> Probability will be : 0.32413499999999995
CDF will be : 0.74431

Now I will calculate mean and standard deviation;

mean=np & standard deviation=sqrt(npq)

import scipy.stats as binom
import math
mean,var=binon.stats(n=6,p=0.3)
print("mean := ',mean)
print("standard deviation :=",math.sqrt(var))
[out]>>mean := 1.7999999999999998
standard deviation := 1.1224972160321822

Poisson Distribution

What is poisson distribution ?

It describe the distribution of rare event in a given population.It is mainly used for forecasting eg:- how many pligrims have been visited to vaishno devi during covid-19 pandemic etc.

How to decide when to use binomial or Poisson ?

i. > Suppose you have given average number of probability per unit time and you have to find certain number of probability for a particular time then poisson is used.

ii.> If you have given exact probability and need to find out probability happening certain number of time out of 10 times, 100 times etc.

Mathematical formulation and parametric understanding

lambda:- mean number of occurances in the interval.

x/k:-number of success we are interested in

Q..>Customers arrive at a rate of 72 per hour to my shop. What is the probability of 𝑘 customers arriving in 4 minutes? a) 5 customers, b) not more than 3 customers.

sol:- customer arrive per minute=72/60=1.2 then, in 4 min total number of customer will arrive =1.2 X 4=4.8

now applying poisson formula we will get;

pictorial presentation using python

from scipy.stats import poisson
import matplotlib.pyplot as plt
import seaborn as sns
poisson_data=poisson.rvs(mu=4.8,size=1000)
sns.distplot(poisson_data,hist=True,kde=True,color="red")
from scipy.stats import poisson
probab1=poisson.pmf(k=5,mu=4.8)
probab2=sum(poisson.pmf(k=[0,1,2],mu=4.8))
print(probab1,probab2)
[out]>> 0.17474768364388296 0.14253921888902693

--

--