Confidence Intervals for the Variance

Irene Markelic
10 min readMar 1, 2024

--

Figure 1: The goal is to compute a and b, the lower and upper bound of the confidence
interval we seek.

As always, I try to present the topic as accessibly as possible. However, the following text assumes that you have basic statistical knowledge (population, sample, parameter, statistic, random variable, pdf, ..) or that you have already read this post. There will be a mathematical proof at the end which makes extensive use of moment generating functions (MGFs). Therefore, if you want follow the proof, make sure, you have some knowledge about this topic as well. Despite the knowledge it assumes, the proof can be skipped and it will still be possible to understand how to compute confidence intervals for the variance.

So let’s get started: The basic procedure when constructing confidences intervals (CIs) is as follows:

1. Determine the parameter to be estimated.
2. Determine how to compute the statistic (the estimator for the parameter), keeping in mind that the statistic is a r.v. itself and therefore has a distribution.
3. Determine the distribution of the statistic.
4. Determine the significance-level α.
5. Compute the CI based on the distribution and the significance-level.

Before we start with the variance, let’s quickly remind ourselves how we applied the above procedure to estimate the sample mean. (This is what we did in the above reference.) It will be recalled that we went through the exact steps as just described: the first step was estimating the parameter for the population mean µ. Here, we only had access to a limited sample. The statistic we computed was the sample mean , which constitutes the second step. I then stated that the sample mean follows a t-distribution (under a few assumptions) with mean µ and variance SEM (standard error of the mean), which is the third step. Subsequently we determined a confidence level (1 − α) of .95 in the fourth step and then computed the actual values as indicated by the code snippet below, which was the final step. The below Python-code snippet shows how to compute the confidence interval for the mean estimate for a sample (which I draw from a normal distribution) and a confidence level of .95.

import scipy.stats as stats
import numpy as np

# generate a random sample
sample = stats.norm.rvs(size=30)

confidence_level = .95
mean = np.mean(sample)
sem = stats.sem(sample)
ci_lower_bound, ci_upper_bound = stats.t.interval(confidence_level,
df=len(sample)-1,
loc=mean,
scale=sem)
print(f"{ci_lower_bound=}, {ci_upper_bound}")
#ci_lower_bound=-0.4654019041427352, 0.43109995387769284

Figure 2 shows the computed distribution and confidence level based on the given code example.

Figure 2: Plot generated from above code example. The goal is to compute values for the
lower and upper bound of the confidence interval, here denoted ”ci lower bound” at x=-
0.47 and ”ci upper bound” at x=0.43. We expect the true population mean to be within
this interval with a .95 confidence.

Thus far, this summarized finding confidence intervals using the mean as an example. Of course, this text is about estimating the population variance, σ². So, following our procedure: the parameter we want to estimate is the population variance. The statistic we want to compute for that is the sample variance Sₙ² . The formula for the sample variance
for a sample of size n is:

The sample variance.

Now we reach step 3, where we need to determine the distribution for the statistic. Once we have this the rest is straighforward. Since the former is a bit more involved, I will proceed as follows: I will skip step 3 for now, but state its result. With this result we can quickly complete step 4 and 5. After this, I will go back to the sampling distribution and present the necessary proof.

0.1
Compute the CI, based on the distribution and the significance-level, step 4 and 5

And here it is — the important insight from step 3, the next subsection:

Equation 2: Relation between the sample variance and the Chi-squared distribution

The above expression tells us that a function of the parameter we seek (σ) has a Chi-squared distribution with n−1 degrees of freedom. This is genuine progress, for a lot is known about the Chi-squared distribution. What we can do now is similar to what we did above when we computed the CI for the mean: we can take the Chi-squared distribution and compute the boundaries for the desired confidence-level, as indicated in figure 1, where a plot of a Chi-squared distribution with 9 degrees of freedom (arbitrarily chosen) is shown. The figure merely serves an illustrative purpose. Note: the distribution is not symmetric. If a and b in the figure denote the lower and upper confidence interval boundaries, we can express it as the following probability statement:

Let’s assume a concrete value for the confidence level, say 1 − α=.95 (step 4). Then we can compute a and b in Python as follows:

import scipy.stats as statsdegrees_of_freedom=9

confidence_level=0.95
ci_lower_bound, ci_upper_bound = stats.chi2(df=degrees_of_freedom).
interval(confidence_level)
print(f"{ci_lower_bound=}, {ci_upper_bound}")
#ci_lower_bound=2.7003894999803584, 19.02276779864163

And now that we know a and b, we can use the part within the probability expression above to solve for the statistic we want to estimate — σ:

And here is a Python example on how to use this:

import scipy.stats as stats
import numpy as np
import matplotlib.pyplot as plt

n = 30
sample = stats.norm.rvs(size=n, random_state=12)
confidence_interval = .95
sample_variance = np.var(sample, ddof=2)
a, b = stats.chi2(df=n).interval(confidence_interval)

print(f'{a=}, {b=}')

ci_lower_bound = np.sqrt((((n-1)*sample_variance)/b))
ci_upper_bound = np.sqrt((((n-1)*sample_variance)/a))

print(f'{ci_lower_bound=}, {ci_upper_bound}')
print(f'{sample_variance=}')
#a=16.790772265566623, b=46.97924224367115
#ci_lower_bound=0.9598862758231483, 1.6056003361485465
#sample_variance=1.492614218003601

So this is it. Now we can compute confidence intervals for the sample variance! Try to summarize what we did in your own words.

0.2
Determine the distribution of the statistic: The sampling distribu-
tion of the sample variance, step 3

Now back to step 3, which we’ve skipped. In this subsection I’ll show how to derive equation 2. The proof I’ll follow is taken from here, which is a very accessible and clear reference. I’ll try not merely repeating what is already mentioned there, rather I’ll present it in a slightly different way. I will not add anything new, so feel free to skip this part of my article and simply consult the given reference.
Before we start, I’ll list a few things (theorems) — mostly without proof — you will need to know to follow along:

  1. Functions of independent r.v.s are independent variables too.
  2. E[XY ] = E[X][Y ] if X, Y are independent r.v.s.
  3. The sum of n squared standard normal r.v.s (Z) follows a Χ²ₙ dis-
    tribution (Chi-squared distribution) with n degrees of freedom, as indicated by the subscript n. A proof is given here. In math:

4. The moment generating function, MGF, of a r.v. X is:

More on MGF for example in K. Blitzstein and Hwang [1].

5. The moment generating function of a Χ²ₙ distribution is:

6. The sample mean X̄ and the sample variance Sₙ² of a sample are independent. See here for a proof.

7. The sample mean is

Multiplying both sides with n gives:

8. Now we show that the sum of differences of Xᵢ — X̄ is zero:

If we substitute what we know from 7, we get:

Given n observations X₁ , X₂ , … Xₙ identically distributed according to Ν(µ, σ). Then each Xᵢ can be standardized, i.e. converted to a standard normal r.v. by the usual procedure of subtracting the mean and dividing by the standard deviation. We have just stated under bullet 3 that the sum of n squared standard normal r.v.s follows a Χ²ₙ
distribution. Thus:

Equation 9

Now, let’s look again at the statement we want to prove:

For convencience, I’ll restate the formula for the sample variance:

Note, that the subscript n simply denotes the number of examples in the given sample for which the sample variance was computed. Multiplying the sample variance by (n − 1) gives:

Multiplying both sides by 1/σ² gives us almost equation 9, but not quite. Instead of subtracting the mean µ we subtract the sample mean X̄.

Equation 13

How can we make equation 9 look a bit more like equation 13? We use a ”trick”, we add X̄ to equation 9 and immediately subtract it again, which results in adding 0:

Now we dissolve the parentheses using the binomial formula:

Pull the constants out of the sum in the middle :

Now, realize that the middle term reduces to 0 (compare last (8) from above enumeration). Thus, we are left with:

So, we have shown that:

And we know from equation 13, that this equals to:

For convience, let’s call

Thus,

For the rest of the proof we make use of what we know about moment generating functions, MGFs. From 5 we know that the MGF of a Χ²ₙ distribution is (1 − 2t)^(-n/2) . Thus, the MGF of W which has a Χ²ₙ distribution is:

(The subscript W reminds us of which random variable the MGF is. The t somehow appears from nowhere, in order to understand this better, please read about MGFs in general.)
Indeed we know even more: from 4 we know that the MGF, of a r.v. X is the expected value of e to the power of t times X, thus for W we have:

Equation 20

Since

we can plug this into the right hand side of equation 20 and we get:

Equation 21

Now, realize that

is a function of S²ₙ. We have stated under bullet (6) that the sample mean X̄ and the sample variance S²ₙ of a sample are independent. And since functions of independent r.v.s are independent variables themselves (1) and furthermore E[XY ] = E[X][Y ] if X, Y are independent r.v.s. (2), we can dissassemble the expectation of the product into a product of expectations:

Equation 22

One last thing to notice is that

This is because X̄ is a r.v. itself (remember, the sampling distribution of the sample mean) and it’s distribution is N(µ, σ²/n). Therefore, we can standardize X̄ by subtracting µ and dividing by the standard deviation
in the following manner:

Squaring both sides gives:

And since we know that the distribution of a squared standard normal r.v. is

we can replace this into equation 22:

Equation 25

Again recall from 4 that (by definition) the moment generating function, MGF, of a r.v. X is:

and that (see 5) the MGF of a Χ²ₙ distribution is

Therefore,

for t<1/2 and

for t<1/2. Plugging this into equation 25 gives:

Equation 26

Remember that we are interested in

and solving for it gives: (Notice, that on the right hand sind in equation 26 the exponent is negative, thus dividing the left hand side by (1–2t)^(-1/2) is equal to multiplying by the expression with a positive exponent)

But

is just the MGF for a

Theorem 6.4.5 from K. Blitzstein and Hwang [1] states:
“The MGF of a random variable determines its distribution”: if two r.v.s have the same MGF, they must have the same distribution.” This is also known as the uniqueness property. From this follows that

must have the same distribution as a

variable:

Which was to be shown.

References
[1] Joseph K. Blitzstein and Jessica Hwang. Introduction to Probability Second Edition. 2019.url: https://drive.google.com/file/d/1VmkAAGOYCTORq1wxSQqy255qLJjTNvBI/
view.

--

--