image:Voice of Nigeria

Statistics : Gaussian Distribution & Z-Distribution & T- Distribution

Anjani Kumar
Analytics Vidhya

--

Overview :

There are various distributions types used in machine learning to describe the distribution of data in a population or sample of data set.In machine learning it is used to visualize the data distribution and outliers detection in data set.

Topics covered

  1. Gaussian Distribution
  2. Z-Distribution
  3. T- Distribution

Normal Distribution or Gaussian Distribution or Bell Curve:

Discovered by Carl Friedrich Gauss, Gaussian Distribution also known as Normal Distribution is a bell shape curve and shows the distribution of data values of a population. This is used to check the deviation and skewness of the data.

Gaussian distribution follows the empirical rule :

1. This is symmetrical curve where 50 percent data lies left to the mean and 50 percent data lies right to the mean.

2. Where Mean=Median=Mode are same.

3. 68% data lies within one standard deviation

95% data lies within two standard deviation

99.7% data lies within 3 standard deviation

Example of Normal Distribution :

  1. Heights of the People
  2. Blood Pressure
  3. Marks of the test
image:kindpng

Standard Normal Distribution or Z-Distribution:

A Normal distribution can be converted to Standard Normal Distribution using Z-Score (Standard Score).

Standard Normal Distribution / Z-Distribution has mean 0 and Standard Deviation equal to 1

Z-Score tells how many standard deviations away our data point lies from mean.Z-Score will make data unit less.

image:ThoughCo
image:MathisFun

T-Distribution :

This is also called student T-Distribution.T-Distribution is any member of family of continuous probability distributions that arises when estimating the mean of a normally distributed population in situation where the sample size is small (n < 30) and population standard deviation is not known.

A specific T-distribution depends on a parameter known as the degree of freedom(DOF)

DOF refer to the number of independent pieces of information that goes into the computation of standard deviation of sample(s)

As the DOF increases,the difference between the T-Distribution and the Standard Normal Distribution become smaller and smaller.

For more that 100 DOF ,the standard normal distribution (Z) provides a good approximation to the T-value

image:SlideServe
image:FAVPNG.com

Conclusion : Each distribution plays a major role in machine learning.It is must to understand these concepts.

Hope you like my article.Please hit Clap 👏(50 times) to motivate me to write further.

Want to connect :

Linked In : https://www.linkedin.com/in/anjani-kumar-9b969a39/

If you like my posts here on Medium and would wish for me to continue doing this work, consider supporting me on patreon

--

--