An Introduction to Probability Distribution

Neelam Tyagi
Analytics Steps
Published in
6 min readMay 11, 2020
Here you will know the importance of Probability Distribution and its 3 types Binomial, and Poisson, and Normal Distributions

As Josh Wills once said,

“Data Scientist is a person who is better at statistics than any programmer and better at programming than any statistician.”

Defined by Wikipedia “statistics is the study of the collection, analysis, interpretation, presentation, and organization of data”. So, the data scientists need to know statistics along with machine learning, and deep learning concepts.

Consider a simple case of data analysis that demands minimally “descriptive statistics” and “probability theory” in order to make better business decisions from data. These prime concepts involve probability distributions, statistical significance, hypothesis testing, and regression.

In this blog, we go through the introduction of the probability distribution and its three types: Binomial, and Poisson for a discrete random variable, and Normal Distribution for a continuous random variable.

For a brief, “Probability distributions are of integral attention in complex systems of research, especially in the scrutiny of the properties of financial markets. They are something that permits us to double-check the internal functioning of complex systems, to uncover their consistencies and elements of their structure.”

Probability Distributions and Its Types

Probability can be implemented for measuring the likelihood of some events, it can also outline the tendency of all possible results.

But, a thing of curiosity in probabilities is a random variable where the connection between each outcome for a random variable and their corresponding probabilities is explained as a probability distribution. Below are the descriptions for the same;

Random Variables:

In layman terms, a random variable is a numerical portrayal of the interferences under a statistical experiment. Moreover, a random variable that accounts only for a finite number or an infinite sequence is described as to be discrete, on the other hand, one that considers any value in some interval of a real number line is known as continuous.

For example, to depict the number of automobiles sold from a dealership in a day, a random variable would be discrete and a random variable that describes the weight of a person in kilograms (or pounds) will be continuous.

Probability Distribution:

It is simply a statistical function that explains complete probable values and likelihoods that are accounted for by a random variable in a given range.

Basically, this range would lie amid maximum and minimum feasible values precisely where these values to be plotted on the probability distribution function on the basis of some primary factors like standard deviation, skewness, kurtosis, and average (distribution mean).

Knowledge of probability distributions yield;

  1. It enables us to epitomize and decipher data through the implementation of some numbers.
  2. It assists in locating the outcomes of experiments in the specified context, i.e, it permits one to identify whether the outcomes are compatible with defined ideas or not.

For a discrete random variable, a probability distribution is the classifying of the probabilities for its probable outcomes, or, a formula for finding the probabilities.

Whereas, for a continuous random variable, a probability distribution can be indicated in terms of a formula that finds the probability of a variable that would prevail in a particular specified interval.

Let’s begin with Binomial Distribution………….!!!!!!!!!

Binomial Distribution

Two of the most widely deployed discrete probability distributions are Binomial and Poisson.

The binomial probability can be expressed as;

Binomial Distribution Formula

Where “x” is the success that will occur in “n” trials under a binomial experiment.

When the probability of success and failure is equal then the graph of binomial distribution in that situation looks like;

Depicting the graph for a binomial distribution when the probability of success and failure is equal.
Binomial Distribution

Facts and Features

The mean of a binomial distribution is calculated by multiplying the number of trials by the probability of successes, i.e, “(np)”, and the variance of the binomial distribution is “np (1 − p)”.

When p = 0.5, the distribution is said to be symmetric about mean, when p > 0.5, the distribution is skewed to the left, and when p < 0.5, the distribution is skewed to the right.

It incorporates the following properties;

  1. It involves a sequence of “n identical trials”.
  2. The trials are independent as the outcome of past events doesn’t decide or affect the outcome of the present event.
  3. Two outcomes are possible, “success or failure”, “win or lose” or “gain or lose” for each outcome.
  4. The probability of success on each trial, denoted by “p”, doesn’t alter from trial to trial.

Poisson Distribution

Poisson random variables are the number of successes that yield from the Poisson experiment and their corresponding probability is known as the Poisson distribution that can be expressed as;

Poisson Distribution Formula

Where “X” is the Poisson random variable, “x” is the number of successes and the mean “µ” is the fundamental parameter in this distribution (defined below).

The graph of Poisson distribution is shown below;

Displaying the graph for Poisson Distribution that shows the probability of random variables (x)
Poisson Distribution

Facts and Features

Poisson Distribution is only suited in the conditions where events occur at random points of time and space, and point of interest exists only in the total number of occurrences of the event.

In addition to that, a few symbols that are used in Poisson distribution are; “λ” is the rate at which an event occurs, “t” is the length of a time interval, and X is the number of events in that time interval, then mean (denoted by “µ”) is defined as the “λ” times length of that time interval, “µ = λ*t”.

Some features of Poisson distribution are following;

  1. One successful event would not affect the outcome of another successful event.
  2. Over a small interval, a probability of success must be equivalent to the probability of success around a larger interval.
  3. If the interval tends to be smaller then the probability of success approaches zero in an interval.

Normal(or Gaussian) Distribution

It is the most common distribution in all the probabilities and statistics and can be used frequently in finance, investing, science, and engineering, the probability density function for the normal distribution is defined as;

Normal(or Gaussian) Distribution Formula

Where the μ and σ represent the mean (the point of the center of the distribution) and the standard deviation (how to spread out the distribution is) of the population respectively.

Highlighting the graph for standard normal distribution when μ=0 and σ=1
Standard Normal Distribution

The distribution looks like this if the mean and standard deviation equal are set to be zero (μ=0) and one (σ=1) respectively, with a skew of zero and kurtosis = 3. In this condition, the distribution is also called the “standard normal distribution”.

Facts and Features

The normal distribution is completely described by its mean and standard deviation, i.e., the distribution is not skewed and does show kurtosis due to which the distribution is symmetric and depicted as a “bell-shaped curve” when plotted.

Under normal distribution, almost 68% of the data accumulated lie within +/- 1σ of the mean, 95% within +/- 2σ, and 99.7% within +/- 3σ.

It has the following features:

  1. The mean, median, and mode of the distribution are co-existed.
  2. The distribution curve is bell-shaped and symmetrical across the line where “x=μ”.
  3. Exactly half of the values are situated at the left of the center and the other half at the right.
  4. Under the curve, the total area is “one”.

Conclusion

In core terms, probability distributions are a crucial elemental concept in probability, you are now familiar with the name and shape of common probability distributions. You have found that structure and type of any probability distribution vary according to the features of random variable distributions like discrete, or continuous. It, in turn, measures how to summarize the distributions or how to estimate highly likely outcomes and their respective probability.

There are several other notorious discrete and continuous probability distributions such as geometric, hypergeometric, and negative binomial for discrete distributions and uniform, exponential, gamma, chi-square, beta, t, and F-test for continuous distributions that will be covered in our coming articles.

--

--

Neelam Tyagi
Analytics Steps

The Single-minded determination to win is crucial- Dr. Daisaku Ikeda | LinkedIn: http://linkedin.com/in/neelam-tyagi-32011410b