The Descriptive Statistics Behind Machine Learning

8 min readJun 29, 2021

The Descriptive Statistics Behind Machine Learning

Introduction

Descriptive statistics play a crucial role in getting an understanding of the data for your machine learning model. The machine learning model predicts while descriptive statistics tell you about the data, which is the essential initial step in model building. This blog will discuss the essential descriptive statistical concepts that will help you understand your data better, resulting in a better machine learning model.

Random Variables

A variable whose possible values are the numerical outcome of a random phenomenon is called a random variable, usually denoted by X.

There are two types of random variables, discrete random variables and continuous random variables.

Discrete Random Variables

A random variable that can take the countable number of distinct values such as 1,2,3,4, etc., is called a discrete random variable. Discrete random variables are usually (but not necessarily) counts which can be infinite but in real life takes on distinct finite values. Examples of a discrete random variable include:

· Flipping of a coin.

· The number of people at a football match.

· The number of vaccinated students.

The possible values a discrete random variable can take associated with their probabilities give the probability mass function of a discrete random variable. Example include:

Suppose a variable X can take the values 1, 2, 3, or 4.

The following table describes the probabilities associated with each outcome:

Outcome 1 2 3 4

Probability 0.1 0.3 0.4 0.2

The above table denotes the probability mass function of a discrete random variable.

The cumulative distributive function of random variable X gives the probability of X less than or equal to x for every value x. For discrete random variables, the cumulative distributive function is given by the sum of all probabilities.

Continuous Random Variables

A random variable that can take an infinite number of possible values is called a continuous random variable. Examples of a continuous random variable include:

· Height

· Weight

· Amount of Rainfall

· The time required to reach 60 miles speed

A random variable X may take on values between an interval of a real number which takes on probabilities to form a set of possible outcomes A. P(A) is defined as the area under the curve. The curve which represents the function p(x) must meet the following criteria :

· The curve has no negative values (p(x) > 0 for all x)

· The total area under the curve is equal to 1.

A curve meeting these criteria is known as a density curve.

Discrete Distributions

Uniform Distribution

The uniform distribution is used to model an experiment where probability measures of all outcomes are equally likely.

The probability mass function of discrete uniform random distribution is given by

P(X=k) = 1/N, where k = 1,2,3,4…N.

The cumulative distributive function of discrete uniform random distribution is given by

P(X<=k) = k/N, where k = 1,2,3,4…N.

The following formulas can give the mean and variance:

Bernoulli Distribution

Bernoulli distribution is a random experiment with only two possible outcomes: ‘ Success’ and ‘Failure’. For example, the probability of having a tumor in the brain is P=0.5 and not having a tumor is 1-P. It is said to be a binomial distribution with one trial.

The following formulas can give the mean and variance for Bernoulli Distribution:

E[X] = p, var[X] = p(1-p).

Binomial Distribution

A binomial distribution consists of n independent and identical Bernoulli trials, such that each trial can have one of the only two possible outcomes: ‘Success’ and ‘Failure.’

The probability mass function of Binomial distribution is given by:

The cumulative distributive function of Binomial distribution is given by:

The following formulas can give the mean and variance for Binomial Distribution:

E[X] = np, var[X] = np(1-p)

Geometric Distribution

A geometric distribution consists of n independent and identical Bernoulli trials until first success occurs.

The probability mass function of Geometric distribution is given by:

The following formulas can give the mean and variance for Geometric Distribution:

E[X] = 1/p, var[X] = (1-p)/p²

Poisson Distribution

The Poisson distribution is used to model the number of events in a specified interval of time. The average occurrence of an event is given by λ. The probability mass function of Poisson distribution is given by:

where e is Euler’s number.

The following formulas can give the mean and variance for Poisson Distribution:

E[X] = var[X] = λ.

Continuous Distributions

Uniform Distribution

The random variable X of the uniform continuous distribution has an applicable range of (a,b), where any value in the range is equally likely to occur. The following figure shows the distribution:

The probability density function is given by:

The following formulas can give the mean and variance for Geometric Distribution:

E[X] = (a+b)/2, var[X] = (b-a)²/12

Normal Distribution

The normal distribution peaks when the random variable is at mean and gradually decreases on both sides and tends to infinity, forming a bell shape density curve with parameters mean and standard deviation. The following figure shows the curve of the normal distribution:

The probability density function is given by:

When the mean of the normal distribution is zero, and the standard deviation is 1, it is called the standard normal distribution.

Exponential Distribution

The exponential distribution peaks when the random variable is zero and gradually decreases as the value of the variable increases. The following figure shows the exponential distribution of f(x):

The probability density function is given by:

The mean and the variance are listed below:

The exponential distribution also provides the waiting time in a Poisson process, which is a time of occurrence between one event to another.

Gamma Function

The gamma function is a generalization of the factorial function to nonintegral values. So, to stretch factorial function to any real number other than an integer, the gamma function is given by

Chi-Square Distribution

The Chi-Square Distribution is a specialized case of the gamma distribution. If X is the random variable with θ = 2 and α = r/2, r is a positive integer. Then the probability density function of X:

For x>0, follows a chi-square distribution with r degrees of freedom denoted by χ2 and read as “chi-square-r.”

The following formulas can give the mean and variance for Geometric Distribution:

E[X] = r, var[X] = 2r

t-Distribution

The t-distribution is also described as a bell shape curve similar to a normal distribution with a little fatter curve. The t-distribution is used in place of the normal distribution when the sample size is small, and the variance is unknown.

The probability density function of t-distribution is given by

F-Distribution

The f-distribution came into the picture to compare and check whether the variances of the two observed samples are the same or not.

Let χ2m and χ2n be independent variates distributed as chi-squared with m and n degrees of freedom.

Define a statistic Fnm as the ratio of the dispersions of the two distributions:

Joint Distributions

Two Dimensional Random Variables

Let S be a sample space associated with a random experiment E. Let X and Y be two random variables defined on S. The pair (X, Y) is called a Two–dimensional random variable.

If the possible values of (X, Y) are finite or countably infinite, then (X, Y) is called a two-dimensional discrete random variable.

If (X, Y) can assume all values in a specified region R in XY plane (X, Y) is called a two-dimensional continuous random variable.

Joint Probability Distributions

If X and Y are two random variables defined on the same sample space, then P ({X = x}∩{Y = y}) is called their joint probability distribution. Note that

The marginal distribution of X and Y is given as

The expected values of functions X and Y are as follows:

Independent Random Variables

X and Y are independent random variables if the events (X = x) and (Y = y) are independent for all x and y. i.e. two random variables, X and Y, are independent if

Covariance and Correlation

The covariance can give the inter-relationship between two random variables. The covariance of two random variables, X and Y, is given by

Cov(X,Y) = E[X,Y] — E[X]E[Y]

It can also be written as

Cov(X,Y) = E[(X — E[X])(Y — E[Y])

If we standardize the covariance between two random variables with their variability, we get the correlation between them as follows:

Two random variables are said to be uncorrelated if

ρ(X, Y) = 0 = Cov (X, Y)

Conditional Expectations

The conditional expectation of two random variables, X and Y, is given by

Regression of the mean

The basic intuition behind the regression of the mean is if the variable is at extreme at the first time you measure it, it appears closer to the mean the next time you measure the same variable.

This concludes our blog, where we cover different descriptive statistics behind machine learning.

The Descriptive Statistics Behind Machine Learning

Introduction

Random Variables

Discrete Random Variables

Continuous Random Variables

Discrete Distributions

Uniform Distribution

Bernoulli Distribution

Binomial Distribution

Geometric Distribution

Poisson Distribution

Continuous Distributions

Uniform Distribution

Normal Distribution

Exponential Distribution

Gamma Function

Chi-Square Distribution

t-Distribution

F-Distribution

Joint Distributions

Two Dimensional Random Variables

Joint Probability Distributions

Independent Random Variables

Covariance and Correlation

Conditional Expectations

Regression of the mean

Written by Yash Patil