Analytics Vidhya
Published in

Analytics Vidhya

Explaining n-1 — The Intuition Behind Bessel’s Correction for Sample Variance

This is the formula for sample variance that is often presented in the standard “Introduction to Statistics” course at most colleges. The use of n-1 in the denominator instead of n is called Bessel’s correction. Rarely is the n-1 portion explained beyond some handwaving and mumbling about unbiased estimators. When looking around online, I’ve struggled to find any articles explaining the concept succinctly (most “clear” explanations take 15 paragraphs and 5 figures to present their explanation). This page is an attempt to distill and cleanly present the material on this Wikipedia page in as few words as possible.

This article is for people who have some familiarity with statistics, I expect that you have taken a course in statistics at some point in high school or college. The only vocabulary I will clarify is the term unbiased estimator:

In statistics, the bias (or bias function) of an estimator is the difference between this estimator’s expected value and the true value of the parameter being estimated.

When taking a sample from the population we’d typically like to be able to accurately estimate population metrics from the sample data we pulled. For example, the sample mean is an unbiased estimator of the population mean, its expected value is equivalent to the population mean. It’s generally preferable to use estimators that are unbiased.

Why is the naive formula biased?

The naive formula for sample variance is as follows:

Naive Variance Formula (Equation 1)

Why would this formula be biased? Let’s compare it to an estimator that is unbiased, the population variance formula:

Population Variance Formula (Equation 2)

(Already some of you will notice that the bias is introduced by replacing the population mean with the sample mean.)

If our estimator (equation 1) is always less than or equal to another estimator that we know is unbiased (equation 2), then it would have a downwards bias. To state this concept in another way, once we show that this estimator can undershoot an unbiased estimator but it will never overshoot, we know that it must be a biased estimator.

Let’s work through a small example, suppose we’re taking sample from a population consisting of {0, 2, 4, 5, 10, 15}. I reach into the bag and pull out {2, 10, 15}. How would our two estimators behave?

The unbiased estimator produces the following:

The naive (biased) formula produces the following result

Notice that the naive formula undershoots the unbiased estimator for the same sample! Why is this? Well, we can think of variance as measuring how tightly a set of points cluster around another point (let’s call it μ). For a given set of points, the closer μ is to the center of the points, the lower the variance will be!

Intuitively, you can think of how the sample mean is always going to be in the middle of our sample’s data points, whereas the population mean may truly be somewhere else. This means that the naive variance formula always chooses a μ that minimizes variance for the sample, while the population variance formula will often choose a μ that does not minimize variance for the sample. This results in our naive formula falling short of the unbiased estimator.

In fact, the only situation where the naive formula is equivalent to the unbiased estimator is when the sample pulled happens to be equivalent to the population mean. There is no situation where the naive formula produces a larger variance than the unbiased estimator. Try it yourself! This concept lies at the heart of why the naive formula is a biased estimator.

Now that we understand the naive formula is a biased estimator, we must try to think about how to find a useful unbiased estimator for population variance. The reason that we don’t just use the unbiased estimator presented in this section is because we rarely know the population mean when we’re taking samples. We must try to find a different sample variance formula if we want to create an unbiased estimator.

Why does n-1 work?

Clearly the above section shows that the naive estimator has a tendency to undershoot the parameter we’re estimating, so creating an unbiased estimator would involve stretching our naive formula. But by how much? Why n-1? Why wouldn’t n-2 or n/3 work?

Well, to really understand it I’d recommend working through the proofs yourself. I’ll attempt to provide some intuition, but if it leaves you feeling unsatisfied, consider that motivation to work through the proof!

When we pull two observations (with replacement) from a population of size n, there’s a 1/n probability that the same observation is sampled twice. This means, regardless of the population’s distribution, there is a 1/n chance of observing 0 sampled squared difference.

Intuitively, this 1/n chance of observing 0 for the sample variance would mean that we need to correct the formula by dividing by (1–1/n), or equivalently, multiplying by n/(n-1). Now what happens when we multiply our naive formula by this value? We see the infamous n-1 expression on the bottom!

Tada!

If you found anything in this article to be unclear I highly recommend reading through the Wikipedia article on Bessel’s correction. It’s not the easiest read, but it’s where I pulled pretty much all of the content in this article. If you enjoyed the article, please consider leaving a clap! If this type of content is useful to others I’d like to keep making it.

Follow me on Medium if you want to see more articles (no guarantees it will be about statistics though). Thanks to Brian Chu and Adi Mannari for giving feedback for the first draft of this article.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store