Statistics, before starting Machine Learning Journey -Normal Distribution & Parameters

Cerca_Trova
Analytics Vidhya
Published in
3 min readDec 18, 2019

Normal Distribution is the basic and most important topic in statistics.Before going to definition we will understand some basics about normal distribution. We can say that Normal distribution is the combination of mean and standard deviation.

From the above figure , we can see two different normal distributions,both are bell shaped,with clearly explaining the distribution of the data.From the first figure, will get an idea that most of the data fall under 10–30 range ,but for the second one data are distributed widely from 10 to 80.

The main points which need to consider while looking to a normal distribution graph are Mean and Standard Deviation .

From the center point of the graph we can find out the Mean or average of the values. And the width of the graph shows the standard deviation. As the first picture have narrow curve, we get a conclusion that those distribution have lower standard deviation not like the second picture which have wide curve which represents a large standard deviation.

Normal curves are drawn such that 95% of measurements fall under +/- 2 standard deviations around mean. Yes,to get a clear picture we can go back to the pictures. Where the first picture have mean =20 , SD=0.6 and for the second one mean=70 , SD=4. Then for the former one, 95% of measurements fall under 20 or +/- 1.2 inches, and for the later 95% of measurements fall under 70 or +/-8 inches.

As the data in the normal distribution is the population in statistics, we can say that Mean is population mean and SD as population standard deviation. These are the population parameters for the normal distribution , which help us to find the probability and statistics.In real world scenario, consider the population for machine learning experiment is a tough task , which leads to the sample selection from the population. So ideally will take a sample from the population and which is used for the training of the model.

So for the sample data, we will estimate the mean ,variance and standard deviation. Which is known as estimated/sample mean ,variance and standard deviation.

Mean calculation
Variance
SD

Other than Normal Distribution , statistics have Euclidean distribution and Gaussian Distribution. The population parameter for Euclidean distribution is population rate, and for Gaussian distribution are population rate and population shape.

Please raise your comments and support in my learning to improve my understandings.

Reference : https://www.youtube.com/watch?v=rzFX5NWojp0&feature=youtu.be

--

--