Data Analytics Using Python (Part_2)

Published in

Budding Data Scientist

15 min readApr 10, 2020

This is the second post among the 12 series of posts in which we will learn about Data Analytics using Python. In this post we will be discussing about the probability concepts and probability distributions.

Index

Terminologies in Probability
Few Rules and Laws
Probability using Contingency Table
Conditional Probability
Bayes’ Theorem
Probability Distributions
Some Special Distributions

Binomial Distribution
Poisson Distribution
Hypergeometric Distribution
Uniform Distribution
Exponential Distribution
Normal Distribution
Standard Normal Distribution
Normal Probabilities

8. Assessing Normality

Probability is a numerical description of how likely an event is to occur or how likely it is that a proposition is true.
Probability is a number between 0 and 1, where, roughly speaking, 0 indicates impossibility and 1 indicates certainty, ie, 0 ≤ P(A) ≤ 1 for any event A.
The sum of the probabilities of all mutually exclusive and collectively exhaustive events is 1, ie, P(A) + P(B) + P(C) = 1, where A, B, and C are mutually exclusive and collectively exhaustive.

Terminologies in Probability

There are various terminologies that are used when we study probability. Let’s look at them.

Experiment: It is a process that produces outcomes. There can be more than one possible outcome, but only one outcome per trial.
Trial: It is one repetition of the process.
Elementary Event: It is an event that cannot be decomposed or broken down into other events.
Event: an outcome of an experiment which can be an elementary event, or may be an aggregate of elementary events. It is usually represented by an uppercase letter, e.g., A, E1, etc.
Sample Space: It is the set of all elementary events for an experiment.
Union of Sets: The union of two sets contains an instance of each element of the two sets. For example, for the sets X = {1,2,4,6} and Y = {3,4,5,6}, the union of both the sets are given by, X U Y = {1,2,3,4,5,6}.
Intersection of Sets: The intersection of two sets contains only those element common to both the sets. For example, for the sets X = {1,2,4,6} and Y = {3,4,5,6}, the intersection of both the sets are given by, X ∩Y = {4,6}.
Mutually Exclusive Events: These are events with no common outcomes. The occurrence of one event precludes the occurrence of the other event, ie, for two events X and Y, P(X∩Y)=0.
Independent Events: Those events for which the occurrence of one event does not affect the occurrence or non-occurrence of the other event. For two events X and Y, the conditional probability of X given Y is equal to the marginal probability of X. And also the conditional probability of Y given X is equal to the marginal probability of Y, ie, P(X|Y)=P(X) and P(Y|X)=P(Y).
Collectively Exhaustive Events: This contains all elementary events for an experiment. Collectively exhaustive means that the events together make up everything that can possibly happen.
Complementary Events: For an event A, all elementary events that are not in the event ‘A’ are in its complementary event. The probability of the complementary event is given by, P(A’)=1 — P(A).

Few Rules and Laws

The mn rule for counting possibilities: If an operation can be done m ways and a second operation can be done n ways, then there are mn ways for the two operations to occur in order. This rule is easily extend to k stages, with a number of ways equal to n1.n2.n3..nk.

Sampling from a population of size ‘N’ with sample size ‘n’ is given by:

With Replacement: (N)^n
Without replacement: (N!)/n!*(N-n)!

There are mainly four kinds of probability:

The general law of addition is: For two events X and Y, P(XUY)=P(X)+P(Y)-P(X∩Y).

The general law of multiplication is: For two events X and Y, P(X∩Y)=P(X).P(Y|X)=P(Y).P(X|Y)

Special Law of Multiplication for Independent Events: If the events X and Y are independent, then, P(X)=P(X|Y) and P(Y)=P(Y|X). Hence, we get, P(X∩Y)=P(X).P(Y)

Equations for Marginal and Joint Probabilities:

Probabilities by using Contingency Table

A contingency table is a type of table in a matrix format that displays the frequency distribution of the variables. Joint as well as marginal densities can be easily identified from the contingency tables. The following table will help in understanding how to derive the probabilities from the contingency table. For events A1, A2, B1, B2, we have the following contingency table:

So, like in the above figure, we can get the joint as well as marginal probabilities using the contingency table.

Conditional Probability

A conditional probability is the probability of one event, given that another event has occurred

Law of Conditional Probability: The conditional probability of X given Y is the joint probability of X and Y divided by the marginal probability of Y, ie,

Bayes’ Theorem

Bayes’ theorem (alternatively Bayes’ law or Bayes’ rule) describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if the probability that someone has cancer is related to their age, using Bayes’ theorem the age can be used to more accurately assess the probability of cancer than can be done without knowledge of the age. It is an extension to the conditional law of probabilities.

Probability Distributions

A probability distribution is a mathematical function that provides the probabilities of occurrence of different possible outcomes in an experiment. It describes the ‘shape’ of a batch of numbers. The characteristics of a distribution can sometimes be defined using a small number of numeric descriptors called ‘parameters’ and the parameters change for each distribution.

Distributions can serve as a basis for standardized comparison of empirical distributions. It can help in estimating confidence intervals for inferential statistics. Also, it forms a basis for more advanced statistical methods, like it can be used to understand the ‘fit’ between observed distributions and certain theoretical distributions.

A random variable is a variable which contains the outcomes of a chance experiment. A random variable can take discrete values, like in the case of number of apples in the basket, and also, it can be continuous, like the value of mass of objects. The probability distribution function or probability density function (PDF) of a random variable X means the values taken by that random variable and their associated probabilities.

PDF of Discrete Random Variable (also called PMF):

Let the random variable X be the number of heads obtained in two tosses of a coin. So, sample space: {HH, HT, TH, TT}.

The requirements for a discrete probability function is that all the probability values should lie between 0 and 1, inclusively and also, the sum of all the probabilities must be 1.

Cumulative Distribution Function:

The CDF of a random variable X (defined as F(X)) is a graph associating all possible values, or the range of possible values with P(X ≤ x). The value of CDFs always lie between 0 and 1 i.e., 0 ≤ F(Xi) ≤ 1, where F(Xi) is the CDF.

Expected Value of X:

Let X be a discrete random variable with set of possible values D and pmf p(x). The expected value or mean value of X, denoted as E(X) or μ(x) is :

Variance and Standard Deviation:

Let X have pmf p(x), and expected value, then the variance of X, denoted V(X) is :

In terms of expectation, variance can be written as:

Covariance:

For two discrete random variables X and Y with E(X) = μ(x) and E(Y) = μ(y), the covariance between X and Y is defined as Cov(XY) = μ(xy) = E(X — μ(x)) E(Y — μ(y)) = E(XY) — μ(x).μ(y)

In general, the covariance between two random variables can be positive or negative. If two random variables move in the same direction, then the covariance will be positive, if they move in the opposite direction the covariance will be negative.

Properties:

If X and Y are independent random variables, their covariance is zero. Since E(XY) = E(X)E(Y)
Cov(XX) = Var(X)
Cov(YY) = Var(Y)

Properties of Expected Value:

Properties of Variance

Correlation Coefficient:

The covariance tells the sign but not the magnitude about how strongly the variables are positively or negatively related. The correlation coefficient provides such measure of how strongly the variables are related to each other.

For two random variables X and Y with E(X) = μ(x) and E(Y) = μ(y), the correlation coefficient is defined as:

Continuous Probability Distributions

A continuous random variable is a variable that can assume any value on a continuum (can assume an uncountable number of values) like thickness of an item, time required to complete a task, temperature of a solution, height, etc. These can potentially take on any value, depending only on the ability to measure precisely and accurately. Uniform, Exponential as well as Normal distributions are few of the continuous distributions.

Some Special Distributions

Discrete: Binomial, Poisson, Hypergeometric
Continuous: Uniform, Exponential, Normal

Binomial Distribution

A binomial distribution can be thought of as simply the probability of a SUCCESS or FAILURE outcome in an experiment or survey that is repeated multiple times. The binomial is a type of distribution that has two possible outcomes (the prefix “bi” means two, or twice).

For an experiment done ‘n’ number of times, with probability of success p and probability of failure q = 1-p:

Assumptions of Binomial Distribution:

•Experiment involves n identical trials.

•Each trial has exactly two possible outcomes: success and failure.

•Each trial is independent of the previous trials.

• p is the probability of a success on any one trial and q = (1-p) is the probability of a failure on any one trial.

•p and q are constant throughout the experiment.

•X is the number of successes in the n trials.

Real life Instances of Binomial Distribution:

Many instances of binomial distributions can be found in real life. For example, if a new drug is introduced to cure a disease, it either cures the disease (it’s successful) or it doesn’t cure the disease (it’s a failure). If you purchase a lottery ticket, you’re either going to win money, or you aren’t. So, basically anything that has a success or failure outcome can be represented by a binomial distribution.

Poisson Distribution

The Poisson distribution is the discrete probability distribution of the number of events occurring in a given time period, given the average number of times the event occurs over that time period. It describes the occurrence of rare events. Each occurrence is independent any other occurrences and the number of occurrences in each interval can vary from zero to infinity. The expected number of occurrences must hold constant throughout the experiment.

The pdf of Poisson distribution is given by:

The mean and variance of Poisson Distribution is λ.

Real life Instances of Poisson Distribution:

Poisson distribution is applied in situations where there are a large number of independent Bernoulli trials with a very small probability of success in any trial say p. Thus very commonly encountered situations of Poisson distribution are:

•Arrivals at queuing systems:

⇒ airports — people, airplanes, automobiles, baggage

⇒ banks — people, automobiles, loan applications

⇒ computer file servers — read and write operations

•Defects in manufactured goods:

⇒ number of defects per 1,000 feet of extruded copper wire

⇒ number of blemishes per square foot of painted surface

⇒ number of errors per typed page

The Hypergeometric Distribution

The binomial distribution is applicable when selecting from a finite population with replacement or from an infinite population without replacement. The hypergeometric distribution is applicable when selecting from a finite population without replacement.

It is done by sampling without replacement from a finite population. The number of objects in the population is denoted N. Each trial has exactly two possible outcomes, success and failure and the trials are not independent. X is the number of successes in the n trials. The binomial is an acceptable approximation, if N/10 > n Otherwise it is not.

The pfd, mean and variance of hyper-geometric distribution are:

Real life Instances of Hypergeometric Distribution:

A deck of cards contains 20 cards: 6 red cards and 14 black cards. 5 cards are drawn randomly without replacement. What is the probability that exactly 4 red cards are drawn?
A small voting district has 101 female voters and 95 male voters. A random sample of 10 voters is drawn. What is the probability exactly 7 of the voters will be female?

The Uniform Distribution

The uniform distribution is a probability distribution that has equal probabilities for all possible outcomes of the random variable. Since the shape of the graph is rectangular, it is also called a rectangular distribution.

The PDF of Uniform Distribution is given by:

Cumulative Probability of Uniform Distribution

The mean and variance of Uniform Distribution are:

Real life Instances of Uniform Distribution:

Consider the random variable x representing the flight time of an airplane traveling from Delhi to Mumbai. Suppose the flight time can be any value in the interval from 120 minutes to 140 minutes. Since the random variable x can assume any value in that interval, x is a continuous rather than a discrete random variable

Exponential Probability Distribution

The exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. The exponential probability distribution is useful in describing the time it takes to complete a task.

The density function of exponential distribution with mean μ is given by:

Cumulative Probability of Exponential distribution for a value X0 is:

Real life Instances of Exponential Distribution:

Relationship between the Poisson and Exponential Distributions

Normal Distribution

Normal distribution, also known as the Gaussian distribution, is a probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. In graph form, normal distribution will appear as a bell curve. The location is characterized by the mean, μ and the spread is characterized by the standard deviation, σ. By varying the parameters μ and σ, we obtain different normal distributions. Changing μ shifts the distribution left or right and changing σ increases or decreases the spread.The random variable has an infinite theoretical range: -∞ to +∞.

The Standardized Normal Distribution

Any normal distribution (with any mean and standard deviation combination) can be transformed into the standardized normal distribution (Z). Here, we need to transform X units into Z units. The standardized normal distribution has a mean of 0 and a standard deviation of 1. Values above the mean have positive Z-values, values below the mean have negative Z-values.

Translate from X to the standardized normal (the “Z” distribution) by subtracting the mean of X and dividing by its standard deviation:

The formula for the standardized normal probability density function is:

The Standardized Normal Distribution: Example

If X is distributed normally with mean of 100 and standard deviation of 50, the Z value for X = 200 is:

This says that X = 200 is two standard deviations (2 increments of 50 units) above the mean of 100.

Normal Probabilities

Probability is measured by the area under the curve. The total area under the curve is 1.0, and the curve is symmetric, so half is above the mean, half is below.

Example: P(Z < 2.00) = .9772

To find P(a < X < b) when X is distributed normally:

Draw the normal curve for the problem in terms of X.
Translate X-values to Z-values.
Use the Standardized Normal Table.

Finding Normal Probability: Example

Let X represent the time it takes (in seconds) to download an image file from the internet. Suppose X is normal with mean 8.0 and standard deviation 5.0. We need to find P(X < 8.6).

Here, we have transformed the normal distribution into Standard Normal distribution and we need to find the probability of the blue shaded region. For that we make use of the Z-table.

So the probability P(X < 8.6) is 0.5478. If we need to find P(X >8.6), ie, the probability of the non-shaded region, then we simply subtract the above probability from 1. P(X >8.6) = 1- P(X <8.6) = 1- 0.5478 = 0.4522.

Now, suppose that we need to find the probability between two values. Suppose X is normal with mean 8.0 and standard deviation 5.0. We need to find P(8 < X < 8.6). First we calculate the Z values.

Here, we need to find the probability P(0<Z<0.12). This can be found out by splitting it into two parts, P(Z<0.12) — P(Z≤0). We do this conversion because in the Z table, all the probabilities are in the format Z ≤ z.

So, for the above example we get the probability P(8<X<8.6) = 0.0478. Hence, for an interval, we use this method to find the probability value.

Given Normal Probability: Find the X Value

Let X represent the time it takes (in seconds) to download an image file from the internet. Suppose X is normal with mean 8.0 and standard deviation 5.0. We have to find X such that 20% of download times are less than X.

So, here we are given with the probability value, which is 20%, ie, 0.2. This is the probability value of the Z-table. Hence, we need to find the corresponding Z-value in the table that gives the probability 0.2.

We can see that the Z-value -0.84 gives 20% probability. Using this, convert the Z value to X units using the following formula:

So 20% of the download times from the distribution with mean 8.0 and standard deviation 5.0 are less than 3.80 seconds.

Assessing Normality

•It is important to evaluate how well the data set is approximated by a normal distribution.

•Normally distributed data should approximate the theoretical normal distribution:

⇒The normal distribution is bell shaped (symmetrical) where the mean is equal to the median.

⇒The empirical rule applies to the normal distribution.

⇒The interquartile range of a normal distribution is 1.33 standard deviations.

•Construct charts or graphs

⇒For small- or moderate-sized data sets, do stem-and-leaf display and box-and-whisker plot look symmetric?

⇒For large data sets, does the histogram or polygon appear bell-shaped?

•Compute descriptive summary measures

⇒Do the mean, median and mode have similar values?

⇒Is the interquartile range approximately 1.33 σ?

⇒Is the range approximately 6 σ?

•Observe the distribution of the data set

⇒Do approximately 2/3 of the observations lie within mean ± 1 standard deviation?

⇒Do approximately 80% of the observations lie within mean ± 1.28 standard deviations?

⇒Do approximately 95% of the observations lie within mean ± 2 standard deviations?