Probability

Published in

Journey to Machine Learning/Deep Learning/Artificial Intelligence

5 min readSep 7, 2019

To be honest there is no end to the use of probability in our machine learning coliseum, at every point of training or making prediction we are actually calculating the probability of that subject to occur.
In my earlier post on Probability ,Chapter Two of this publication, I’ve explained to you all the very basic theorems and rules of Probability Theory. But among which the Bayes Theorem was the most important one.

1. Probability Mass Function (p.m.f)

The probability mass function (p.m.f) of a random variable is a function that gives the probability of each discrete value that the random value can take up.
The sum of probability must add up to 1.

For instance let us consider the throw of a dice and let the number on the dice face be the random variable X. Then p.m.f can be defined as :
P(X=i) =1/6 , i ∈ {1,2,3,4,5,6}

2. Probability Density Function (p.d.f)

The probability density function (p.d.f) gives the probability of a density of a continuous random variable at each value in its domain.
Since it’s a continuous variable, the integral of the probability density function over it’s domain must be equal to 1.

∫ₐᵇ P(x)dx = 1 where a,b ∈ Domain of P(x)

For example , the probability density function of a continuous random variable that can take up values from 0 to 1 is given by (x) = 2x , x ∈ [0,1]
∫₀¹ 2x dx = [x²]₀¹ = 1

3. Expectation of a Random Variable

Expectation of a random variable is nothing but the mean of the random variable.
Let’s say the random variable X takes n discrete values x₁, x₂, x₃,.., xₙ with probabilities p₁, p₂, p₃, …, pₙ. In other words , X is a discrete random variable with pmf (X=xᵢ) = pᵢ , then the expectation of the random variable X is given by:

Expectation for discrete random variable

If X is a continuous random variable with a probability density function of P(x) , the expectation of X is given by

Expectation for continuous random variable

4. Variance of a Random Variable

Variance of a random variable measures the variability in the random variable. It is the mean (expectation) of the squared deviations of the random variable from its mean (or expectation).
If X is a discrete random variable that takes ’n’ discrete values with pmf given by P(X=xᵢ) =pᵢ , the variance of X can expressed as

If X is a continuous random variable having a pdf of P(x) , then Var[X] can be expressed as

where D is the domain of P(x)

5. Skewness and Kurtosis

Skewness

Skewness measures the symmetry in a probability distribution.
It is a third order moment and can be expressed as

A perfectly symmetrical probability distribution has a skewness of 0

A positive value of skewness means that the bulk of data is toward the left.

A negative value of skewness means that the bulk of the data is towards the right

Kurtosis

Kurtosis measures whether the tail of the probability are heavy or not.
It is a fourth order statistic , and for a random variable X with a mean of μ , it can be expressed as

Datasets with high kurtosis tend to have heavy tails or outliers, whereas a low value of kurtosis means light-tails or lack of outliers.

Kurtosis for a normal distribution is 3.

Normal Distribution curve with kurtosis value = 3

To measure kurtosis of other distribution in terms of Normal distribution , one generally refers to excess kurtosis , which is the actual kurtosis minus the kurtosis for a normal distribution (i.e. 3)

6. Co-variance

The co-variance between two random variable X and Y is a measure of their joint variability.
The co-variance is positive if higher values of X corresponds to higher values of Y and lower values of X corresponds to lower values of Y.
The co-variance is negative if lower values of X corresponds to higher values of Y and higher values of X corresponds to lower values of Y.

The formula for Co-variance of X and Y is as follows:
cov(X,Y) = E[X — μ₁][Y — μ₂]
⇒ cov(X,Y) = E[XY] — μ₁μ₂
where μ₁, μ₂ represents E[X] and E[Y]

If two variables are independent then their co-variance is zero since E[XY]= E[X]E[Y]

7. Correlation Coefficient

The co-variance in general does not provide much information about the degree of association between two variables, because the two variables maybe on very different scales. Getting a measure of the linear dependence between two variables’ correlation coefficients, which is a normalised version of co-variance, is much more useful.
The correlation coefficient between two variables X and Y is expressed as

The value of ρ lies between -1 and +1

Well this post was all about an extension of chapter two, and the very concepts of these would be require in distribution analysis and also in data analysis.
Thanks for reading the post.