Measures of Variance With Python

Madhav Mishra
The Startup
Published in
2 min readJul 30, 2020

Hi folks, welcome back to my new edition of the blog, thank you so much for your love and support, I hope you all are doing well. In today’s learning, we will try to understand about variance and the measures involved in it. Although the blog is very small based on content, I’m sure this will help you all understand things in a better aspect. So let’s start understanding it.

In statistics, variance is a measure of how far a value in a data set lies from the mean value. In other words, it indicates how dispersed the values are. It is measured by using standard deviation. The other method commonly used is skewness. Both of these are calculated by using functions available in pandas library.

Measuring Standard Deviation

Standard deviation is square root of variance. variance is the average of squared difference of values in a data set from the mean value. In python we calculate this value by using the function std() from pandas library.

# Measuring The Standard Deviation Exampleimport pandas as pd#Create a Dictionary of seriesa = {'Name': pd.Series(['Madhav','Ramesh','Divya','Ankita','Santosh','Ketan', 'Niloy','Preethi','Bhaskar','Deeksha']),'Age': pd.Series([25,26,25,23,30,25,23,34,40,30]),'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80[)# Create a Dataframedf = pd.DataFrame(a)# Calculate the Standard Deviationprint(repr(df.std()))

Its output is as follows −

Age       5.466057 
Rating 0.720525
dtype: float64

Measuring Skewness

It used to determine whether the data is symmetric or skewed. If the index is between -1 and 1, then the distribution is symmetric. If the index is no more than -1 then it is skewed to the left and if it is at least 1, then it is skewed to the right

# Measuring The Skewness Exampleimport pandas as pd#Create a Dictionary of seriesa = {'Name': pd.Series(['Madhav','Ramesh','Divya','Ankita','Santosh','Ketan',
'Niloy','Preethi','Bhaskar','Deeksha']),
'Age': pd.Series([25,26,25,23,30,25,23,34,40,30]),'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80])}# Create a Dataframedf = pd.DataFrame(a)# Calculate the Standard Deviationprint(repr(df.skew()))

Its output is as follows −

Age       1.309954 
Rating -0.030865
dtype: float64

So the distribution of age rating is symmetric while the distribution of age is skewed to the right.

Also Attacking the ipynb file for reference

I hope the above collection of stuff is knowledgeable and would have given you a glance about the topic and on this note, I would like to sign off for today. I would love to know if you wish me to cover any topic related to data science , Machine learning etc, then please do leave your comments in the comment section on my blogs so that i can make note of those blogs and write is for everyone’s learning.

Do follow me to get updates regarding all my blogs on Medium & LinkedIn. If you really like the above stuffs then do comment below because learning has no limits .

Stay Happy, Stay Fit, Stay Humble…!

Thank you for reading …!

--

--

Madhav Mishra
The Startup

Data Science Enthusiast | Earnest @ Work | Optimistic Illustrator | Data Science Blogger