Understanding the square function for Variance & Deviation
why we use square function instead of MOD in variance formula ?
Since we all know Data Science is all about data and data is all about variety. So how can we standardised the format or any pattern to understand the variety of data ?
The simple answer is to set a benchmark value from we can map the differential nature of other values in data.
Normally we set mean as this point and simple computation can be given by calculating the sum or the average deviation of all the present points from the mean.
In measure of centrality the deviation always get cancel out from both sides of the mean but when variance considered in the spread in that case we need to check how far the whole spread is contributing including negative points.
Their sign-encounter is the main reason of cancellation of the spread (the negative vs the positive) but if we take the absolute value for deviation, so it doesn’t get cancel out and we can compute the whole length of spread.
Or else in order to convert negative points to positive we can take the square of negative points.
So here the formula comes for the variance.Of course by taking the squares,
Now you must be wondering why these two different formulas are there ?
The answer is The Probability Theory which I will come with this answer in my next article why the n and (n-1) present in formula…
Before that here is a small but the very important question that why we used the square function instead of MOD ?
Of course the square function is better than the absolute function and there is more to know.
- Square function is smooth then the absolute function
- Square function is differentiable
Why we care about differentiation characteristics ?
The Reason is…
Calculus and differentiation plays a very significant role in Data Science and applications like Machine Learning, Deep Learning for optimising the values which is true for the square function but not for the absolute function.
The second most important reason is that in the spread we all want to suppress the low deviated value and magnify the high deviated value and the square function does the same and also helps in outlier treatment.