Statistics: Range, Variance and Standard Deviation

Measure of Variability/Dispersion : Range, Variance and Standard Deviation, Why the Denominator (n-1) and numerator is Squared in Variance???

Bala Murugan N G
Analytics Vidhya
5 min readMay 18, 2021

--

Introduction

In this blog, we will understand the concepts of

  • Population & Sample
  • What is Measure of Variability/Dispersion???
  • Range
  • Variance for Population & Sample
  • Why the numerator is Squared in Variance???
  • Why the Denominator (n-1) in Sample Variance???
  • Standard Deviation for Population and Sample

Let’s get into Action . . . . . . .

Population & Sample

Population : The Population is the Entire group that you are taking for analysis or prediction.

Sample : Sample is the Subset of the Population(i.e. Taking random samples from the population). The size of the sample is always less than the total size of the population.

https://www.omniconvert.com/what-is/sample-size/

Better you understand the Population and Sample, Parameter and Statistic, Biased and Unbiased concepts clearly then read this blog . . .

Population Sample notations
Notation of Population and Sample — Image Created by Author

Measure of Dispersion/Variability

A Measure of variability is one of the Descriptive Statistic that represents amount of dispersion in a dataset. In Measure of Central Tendency describes the typical value, Measure of variability defines how far away the data points tend to fall from the center.

There are three ways to find the Measure of Dispersion. . .

  • Range
  • Variance
  • Standard Deviation
Measure of Dispersion — Statistics
Created by Author

Range

Range is the difference between the largest and smallest values in a dataset. It is one of the method in Measures of Dispersion/Variability.

Range in Statistics
Image Credit: https://cdn.corporatefinanceinstitute.com/assets/range1.png
Image Credit: https://www.educba.com/range-formula/

Python Script

# Sample data
data = {4, 6, 9, 3, 7}

range = max(data) - min(data)
print("Maximum Value : ", max(data))
print("Minimum Value : ", min(data))
print("Range : ", range)"""
Output
>>>>Maximum Value : 9
>>>>Minimum Value : 3
>>>>Range : 6
"""

The range can sometimes be misleading when there are extremely high or low values.

Example: In {8, 11, 5, 9, 7, 6, 2500}:

  • the lowest value is 5,
  • and the highest is 2500,

So the range is 2500 − 5 = 2495.

So we may be better off using Interquartile Range or Standard Deviation

Variance

Variance is one of the Measure of dispersion/variability. It gives, how the data points varied from the Measure of Central Tendency.

Population Variance

Finding the Variance for the Population data is known as Population Variance

Population Variance in Statistics
Image Created by Author

Sample Variance

Finding the Variance to the Sample data is known as Sample Variance.

Sample Variance in Statistics
Image Created by Author

Variance : Python Implementation

Variance Example

Why the numerator is Squared in Variance???

Because, if you didn’t Square the Terms, the opposite signs of (+ve and -ve) values cancel each other and hence it tends to zero. In order to avoid this, we are squaring the values and hence the values becomes (+ve).

Example

Wait . . . Wait . . . Have you noticed Sample Variance Formula??? there is a slight changes in the denominator right when compared to Population variance. . .

Why the denominator (n-1) in Sample Variance?

There are two perspectives. . .

  1. Bessel Correction

Sample Statistic underestimates the population parameter due to samples(Sample mean change as we increase/decrease the sample size) and biased(tilt towards one side of the data). In order to reduce the bias in estimating the population variance, we use (n-1) in denominator.

2 . Degree of Freedom

If we know the Sample Mean, we can calculate the another data points using sample mean. When it comes to population, each and every data points gives independent and unchanged mean.

Degree of Freedom says that, the minimum number of data points/samples required to calculate the statistic. So, according to this point (If we know the Sample Mean, we can calculate the another data points using sample mean), we are reducing our denominator to (n-1)

Standard Deviation

Standard Deviation denotes “How the data points deviates from the Measure of Central Tendency”. The Square root of Variance is Standard Deviation.

Population Standard Deviation

Finding the Std. Dev for Population data is known as Population Standard Deviation

Population Standard Deviation — Statistics
Image Created by Author

Sample Standard Deviation

Finding the Std. Dev for Sample data is known as Sample Standard Deviation

Sample Standard Deviation for Statistics
Image Created by Author

Standard Deviation: Python Implementation

Conclusion

I hope this article will help you to know about Measure of Variability: Range, Variance and Standard Deviation” and Population & Sample with example python script.

Like . . If you like. . .

Thank you,

Bala Murugan N G

--

--

Bala Murugan N G
Analytics Vidhya

Currently Exploring my Life in Researching Data Science. Connect Me at LinkedIn : https://www.linkedin.com/in/ngbala6