Probability & Statistics : Bed Rock Of Machine Learning

Published in

Analytics Vidhya

5 min readJan 6, 2021

Machine Learning is an interdisciplinary field that uses statistics, probability, algorithms to learn from data and provide insights which can be used to build intelligent applications

Joint Probability Distribution

Probability of events A and B denoted byP(A and B) or P(A ∩ B)is the probability that events A and B both occur. P(A ∩ B) = P(A). P(B) . This only applies if Aand Bare independent, which means that if Aoccurred, that doesn’t change the probability of B, and vice versa.

Prob(X=x, Y=y)
“Probability of X=x and Y=y”
p(x, y)

Conditional Probability Distribution

Let us consider A and B are not independent, because if A occurred, the probability of B is higher. When A and B are not independent, it is often useful to compute the conditional probability, P (A|B), which is the probability of A given that B occurred: P(A|B) = P(A ∩ B)/ P(B).

The probability of an event A conditioned on an event B is denoted and defined P(A|B) = P(A∩B)/P(B)

Similarly, P(B|A) = P(A ∩ B)/ P(A) . We can write the joint probability of as A and B as P(A ∩ B)= p(A).P(B|A), which means : “The chance of both things happening is the chance that the first one happens, and then the second one given the first happened.”

Prob(X=x|Y=y)
“Probability of X=x given Y=y”
p(x|y) = p(x,y)/p(y)

Bayes’ Theorem

It is one of the most important formula in probability theory.

Bayes' theorem

In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Reverend…

en.wikipedia.org

Exponential Family

Family of probability distributions

Many of the standard distributions belong to this family — Bernoulli, binomial/multinomial, Poisson, Normal (Gaussian) etc.

Populations and samples

Populations

A population is thus an aggregate of creatures, things, cases and so on. For instance population of any town or when we need to calculate average height of any city we will calculate sum of everyone’s height and then divide by number of persons.

Samples

It is not possible to get height detail of each and every person so we will calculate average height by taking few people’s height.

As we increase size of our sample, sample mean comes closer to population mean.

Random Variables

A random variable is a measurable function which can be of two types →

Discrete Random Variable — When there are only whole numbers not floating numbers. For example: Number of students in a class.
Continuous Random Variable — It can be anything within a specific range.

Outliner

An outlier is a data point in a data set that is distant from all other observations.

What is the reason for an outlier to exists in a dataset?

Variability in the data
An experimental measurement error

What are the impacts of having outliers in a dataset?

It causes various problems during our statistical analysis
It may cause a significant impact on the mean and the standard deviation

Various ways of finding the outlier.

Using scatter plots
Box plot
using z score
using the IQR interquantile range

Lets see how to use z score !!

Z score = (Observation — Mean)/Standard Deviation

z = (X — μ) / σ

def detect_outliers(data):
    outliers=[]
    threshold=3
    mean = np.mean(data)
    std =np.std(data)
    
    
    for i in data:
        z_score= (i - mean)/std 
        if np.abs(z_score) > threshold:
            outliers.append(y)
    return outliers

Normal Distribution

Also known as Gaussian Distribution

Most of the data in the world are in normal distribution. A normal distribution, sometimes called the bell curve, is a distribution that occurs naturally in many situations. For example, the bell curve is seen in tests like the SAT and GRE. The bulk of students will score the average while smaller numbers of students will score a B or D. An even smaller percentage of students score an F or an A.

These are some real life examples of normal distribution:

Heights of people.
Measurement errors.
Blood pressure.
Points on a test.
IQ scores.
Salaries.

The empirical rule tells you what percentage of your data falls within a certain number of standard deviations from the mean:
• 68% of the data falls within one standard deviation of the mean.
• 95% of the data falls within two standard deviations of the mean.
• 99.7% of the data falls within three standard deviations of the mean.

Log Normal Distribution

A continuous distribution in which the logarithm of a variable has a normal distribution.

In simple words if log x is normally distributed. For instance income of people.

Covariance

Covariance and Correlation are very helpful in understanding the relationship between two continuous variables. Covariance tells whether both variables vary in the same direction (positive covariance) or in the opposite direction (negative covariance).