# Understanding Probability Distribution

What is Probability Distribution? What are different types of Probability distributions? How will it help in framing Data Science Solutions?.

Let me try to explain it in very simple words

Definition:-*A **probability distribution** is the mathematical function that gives the probabilities of occurrence of different possible **outcomes** for an experiment.*

Rolling a Dice gives me a set of outcomes distributed in a particular way , where as the marks of a particular subject of a class gives me another distribution and occurrence of car accidents in a particular year follows an entirely different distribution and so on. Different Distributions help us to know more about the data and its characteristics. It helps to understand what could be the possible outcome if it follows a particular distribution.

Probability Distribution is broadly classified into two** Discrete Probability Distributions** and **Continuous Probability Distributions.**

**Discrete Probability Distributions**

When you toss a coin you will have either Heads or Tails .Your outcome is discrete. You cant have any value in between. Here counts of events are discrete functions. Same is the case with rolling a dice. Likelihood of rolling a specific number is discrete. Your output will be 1,2,3,4,5 or 6.Discrete as name suggests has a boundary unlike a continuous one. It can be either one or two when you throw a dice, it won’t be 1.5 or 1.22345.Every possible outcome in discrete probability distribution has a probability for occurrence. Binomial Distribution, Poisson Distribution, Uniform Distribution etc are the examples of Discrete Probability Distributions

**Continuous Probability Distributions.**

In the case of measurement of Height or weight of a class, the values or outcomes we observe wouldn’t be a discrete number. It could be continuous values .A person’s weight could be 48.050 and another could be 46.063.It can be calculated to as many decimals as you need based on the scale of the machine used for measurement. Continuous probability functions are also known as probability density functions. Sum of probabilities in a continuous probability function remains one .However we cannot confirm that each possible value/outcome will have non-zero likelihood. In Continuous probability function we calculate the probability of occurrence between two values or points. Area under curve for a probability plot of continuous probability distribution is one. Normal distribution , Weibull distribution, Log Normal distribution are examples of Continuous probability Distributions

Now let’s look at Distributions in details

**Uniform Distribution**

A uniform distribution is a distribution where all the outcomes (between maximum value and minimum value)has an equally likely probability. Uniform distribution can be Continuous uniform distribution and Discrete Uniform distribution.

*Continuous uniform distribution.*

Continuous uniform distribution also called as rectangular distribution describes an experiment where there is an arbitrary outcome that lies between certain bounds, *a* and *b*, which are the minimum and maximum values.

You have entered your apartment and are about to take an elevator to your floor .You call and elevator takes between 0 and 40seconds to reach you after you press the button. This is a classic example of continuous uniform distribution with Minimum value zero and maximum value 40 seconds.

The probability density function of Continuous Uniform Distribution is

Expected Value of Continuous Uniform Distribution is

Variance of Continuous Uniform Distribution is

In the above problem E(X) =1/2*(40) = 20s and V(X) = 1/12*(40)²=400/3

*Discrete Uniform distribution.*

Discrete uniform distribution is a symmetric probability distribution wherein a finite number of values are equally likely to be observed; every one of *n* values has equal probability 1/*n*

Drawing a heart, a club, a diamond or a spade from a pack of card is an example for Discrete Uniform distribution.

The probability density function of Discrete Uniform Distribution is

Expected Value of Discrete Uniform Distribution is

Variance of Discrete Uniform Distribution is

In the above problem E(X) =(1+6)/2 = 3.5 and

V(X) = ((6–1+1)²-1)/12=35/12=2.9

**Bernoulli Distribution**

The Bernoulli distribution is the discrete probability distribution of a random variable which takes a binary output: 1 with probability p, and 0 with probability (1-p). Few examples for Bernoulli distribution is , when you toss a coin, you will either get a Heads or Tails, if you associate getting Heads with winning or success and getting Tail with loosing , then it is a Bernoulli distribution. Same with success or failure , Gender male or female, passing or failing in an exam etc.

Imagine getting a head while flipping a fair coin is considered as success, then head is equated to 1 and tail is equated to 0 . Probability of getting head is 1/2.Probability of success is

The expected value of a Bernoulli random variable is

The Variance of Bernoulli distributed X is

Bernoulli distribution is when we do experiment only once, like flipping a coin once or throwing a die once. So what if the experiments are repeated for multiple times? The distribution for that is Binomial Distribution. Bernoulli distribution is a special case of the binomial distribution where a single trial is conducted (so *n* would be 1 for such a binomial distribution).

**Binomial Distribution**

The binomial distribution is used when there are exactly two mutually exclusive outcomes of a trial, like Heads or Tails while flipping a coin, raining or not raining tomorrow ,winning or losing a match. These outcomes are appropriately labeled as “success” and “failure”. The binomial distribution is used to obtain the probability of observing *x* successes in *N* trials, with the probability of success on a single trial denoted by *p*. The binomial distribution assumes that *p* is fixed for all trials.

The probability of getting exactly *x* successes in *n* independent Bernoulli trials is given by the probability mass function:

Suppose we are flipping a coin for 6 times, if probability of getting head is considered as success and getting tail as failure .Each trial has a probability of success 1/2 . What will be the probability of getting heads 4 times.

Each Trial is a Bernoulli and it’s probability of occurrence is p

Getting 4 heads in flipping coin 6 times is 6c4. Which is equal to 15 and probability of success is 1/64. P(X=4) = 15/64

If *X* ~ *B*(*n*, *p*), that is, *X* is a binomially distributed random variable, n being the total number of experiments and p the probability of each experiment yielding a successful result, then the expected value of *X* is:

The variance of binomial distribution is

**Poisson Distribution**

We now know how binary distribution works. Imagine a case where the number of events is near to infinity or is very very large number and also the probability of event is very very low or near to zero ,like number of emails you receive in a year, number of accident claims that a insurance company receives etc. In above cases you may not have exact ’n’ and ‘p’ value but you know that n is near to infinity and p is near to zero. In such cases we use Poisson distribution. In Poisson distribution we consider the parameter λ.

A Poisson Process is a model for a series of discrete event where the *average time* between events is known, but the exact timing of events is random. The arrival of an event is independent of the event before .

Poisson distribution is the distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space, if these events occur with a known constant mean rate and independently of the time since the last event.

When** λ** becomes bigger, the graph looks more like a normal distribution

A discrete random variable *X* is said to have a Poisson distribution with parameter *λ* > 0, if for *k* = 0, 1, 2, …, the probability mass function of *X* is given by

The Poisson distribution can be applied to systems with a large number of possible events, each of which is rare.

The positive real number *λ* is equal to the expected value of *X* and also to its variance

**Normal Distribution**

Normal distribution is most common distribution which we see in our daily life . There are lot of examples around us for this distribution which is other wise called Bell-curve like distribution of heights of students in the calls, blood pressure, measurement error, and IQ scores.

The Probability Density function of Normal distribution is

The parameter μ is the mean or expectation of the distribution (and also its median and mode), while the parameter σ is its standard deviation. The variance of the distribution is σ^2

Empirical rule of Normal distribution is that, the 68.27% values lies within one standard deviation ,the 95.45% values lies within two standard deviation and the 99.73% values lies within two standard deviation

**Gamma Distribution**

What is Gamma Distribution? Where is it used ?Gamma distribution helps to predict the wait time till the ’n’ th event occurs. Gamma distribution has two parameters, alpha**- **which represents the shape , and beta- which represents the *scale*. Shape Parameter as name suggests define the shape of distribution and scale parameter defines the statistical dispersion . If *s* is large, then the distribution will be more spread out; if *s* is small then it will be more concentrated. Few examples of Gamma distribution are amount of rainfall accumulated in a reservoir , size of loan defaults or aggregate insurance claims, load on web servers etc

The Probability Density function of Gamma distribution is

The mean of Gamma distribution is

The Variance of Gamma distribution is

**Summary**

I have covered the main distributions which are useful for Data scientists in their analysis and modelling. Please share your thoughts and questions in the comments section. I will be happy to respond to your questions .

References:-