Confidence, Tolerance, and Prediction Intervals for Statistical Forecasts.

Published in

Mathematical Musings

10 min readOct 24, 2022

When dealing with statistical forecasting, you hear three specific types of intervals. The confidence interval, tolerance interval, and prediction interval. They all serve different purposes but can have confusing definitions for beginners. Let’s demystify all three by thinking through what purpose each one serves.

Preamble

If you are an absolute fresher, you need this preamble (else skip to the next section). Do you hear product managers saying a survey of 30 people on the user panel is good enough to collect the data for a feature that represents the user population?

Ask them about the intuition behind it, and the majority will say they read a blog that said that is enough. But a few will be able to provide statistical reasoning behind that number (provided that they know statistics)

A sample size of 30 of a population is a severely misleading myth. It is not a one size fits all number.

Statistically, that number is good enough for a 95% confidence interval with a ± 18% margin of error, which means your statistical parameter can swing 18% up or down (a 36% swing) relative to the true population parameter. This also assumes that the population proportion is evenly distributed with homogenous attitudes or “look-alikes.”

A statistical parameter is a point estimate such as mean, median, mode, standard deviation, etc. You would not want a 36% error margin when you decide on feature sets of critical decisions, would you? (Maybe you can swing that for a color or font change, but not for all vital decisions).

To get the basics right, when trying to capture the response of vast population size (1 million, 10 million, a billion), you will not be able to survey all of them.

So you should ask the following question:

What is the minimum number of people that I should survey to capture the sentiments of the overall population?

But this question is statistically incomplete. The critical pieces of data that are missing are:

What is the size of the population?
To what degree of confidence would you like to capture the sentiments?
What is the margin of error you are willing to tolerate?
And what proportion of your population reflects the dominant attitudes you expect as “near-ideal.”

So let’s rehash the question:

What is the minimum number of people that I should survey to capture the sentiments of the overall population of size ≈100k with 95% confidence and 5% error margins, considering that 30% of the population has a college degree or higher.

And the statistical answer to that question is 322! Yup, not 30! (here is a sample size calculator link).

Here 95% is called the Confidence Level.

5% is called the Standard Error or Error Margin.

And collectively, the margin around the statistical parameter for the given confidence level and error is called the confidence interval.

Why do we need these intervals? Great question.

We need these intervals to establish the integrity and quality of a statistical parameter by estimating the probability of how close it is to the true population parameters. So the intervals provide a sort of “Truth Seeking.”

Confidence Intervals of a Prediction

Let’s look at the technical definition and code for the confidence interval and see how it applies to a prediction (we shall use a simple linear regression)

A Confidence Interval (CI) is a range of estimates for an unknown statistical parameter (mean, std). CI estimates only the population parameter, and the sampling error determines the width of a confidence interval.

A Confidence Level is the probability of certainty to which you want to estimate the statistical parameter of the population (let’s say, “mean of the population”).

Assuming you have a sample and do not have the population from which it came, you can ask the following question:

Given the sample mean, what is the 95% confidence interval within which the true population mean falls around the estimated sample mean?

Assuming you had access to the population, you can see that as the sample size approaches the whole population, the sampling error decreases. The width of the CI approaches zero as it converges on the single value of the population mean.

Let’s take a look at how the CI estimate plays out visually. Let me generate some dummy data.

Python code: getting the imports out of the way

import numpy as np
from matplotlib import pyplot as plt
from scipy import stats
from scipy.stats import norm
from scipy import stats

Dummy data generation for the population:

x = U (0.1, 1) ; independent variable x is a uniform distribution of values between 0.1 and 1.

y = N (x, 1); a dependent variable y is a function that borrows the mean of x but is scaled by 1 standard deviation.

population_size = 1000
x = np.random.uniform(0.1,1, size=population_size)
y = np.random.normal(loc=x, scale=1) #Homoskedastic STD of 1

Plotting this

plt.figure(figsize=(8, 8), dpi=80)
plt.scatter(x,y)
plt.xlabel("X")
plt.ylabel("Y")

Given the above population, let’s dip into it and obtain a sample of 100.

sample_size = 100
A = np.column_stack((x, y))
indices = np.random.choice(A.shape[0], sample_size, replace=False)
sample = A[indices]
sample_x = sample[:,0]
sample_y = sample[:,1]
plt.figure(figsize=(8, 8), dpi=80)
plt.scatter(sample_x, sample_y)

Let’s run a linear regression to find the best fit line that explains the conditional mean of this sample and also can be used to establish a “cause and effect” relationship for predictions.

slope, intercept, r, p, std_err = stats.linregress(sample_x, sample_y)
print("slope =", slope, " intercept =", intercept)
print("r =", r, " p =", p)
print("std_err =", std_err)def line_func(x0):
    return slope * x0 + interceptdrawline = np.vectorize(line_func)
regression_line = drawline(sample_x)plt.figure(figsize=(8, 8), dpi=80)
plt.scatter(sample_x, sample_y)
plt.plot(sample_x, regression_line, color="r")

The red line that runs across the sample data is the linear regression line that best estimates the conditional mean of the data points.

Now we have to ask, what is the confidence that ŷ (pronounced as y hat) is closer to the true mean of the population?

We can construct a confidence interval with the following equation.

Fig 4 — Confidence Interval for Predictions

ŷ is the regression mean
Sigma y is the standard deviation of the sample.
Z is the critical value of the z score for 95% confidence level. You can get this from the standard tables. We use a Z score (instead of a Student’s t) because our sample size exceeds 30.

The Z score for 95% Confidence Interval is 1.96
SSxx is the sum of squares of the difference between each Xi and the mean.

The python code for this

x_sample_mean = np.mean(sample_x)
SSxx = np.sum([np.square(x_i-x_sample_mean) for x_i in sample_x])y_sample_mean = np.mean(sample_y)def CI_estimator_upperbound(x_i, y_h):
    return y_h + (1.96 * np.std(sample_y) * np.sqrt((1/sample_x.shape[0]) + np.square(x_i - x_sample_mean)/SSxx))def CI_estimator_lowerbound(x_i, y_h):
    return y_h - (1.96 * np.std(sample_y) * np.sqrt((1/sample_x.shape[0]) + np.square(x_i - x_sample_mean)/SSxx))CI_upper_interval = np.vectorize(CI_estimator_upperbound)
CI_lower_interval = np.vectorize(CI_estimator_lowerbound)
CI_upper_line = CI_upper_interval(sample_x,regression_line)
CI_lower_line = CI_lower_interval(sample_x,regression_line)plt.figure(figsize=(8, 8), dpi=80)
plt.scatter(sample_x, sample_y)
plt.plot(sample_x, regression_line, color="r")
plt.plot(sample_x, CI_upper_line, color="grey")
plt.plot(sample_x, CI_lower_line, color="grey")
plt.show()

Our question was:

Given the sample mean, what is the 95% confidence interval within which the true population mean falls around the estimated sample mean?

Figure 7 helps answer this question with the grey lines as the upper and lower bounds of the confidence interval within which the “Mean Regression” of the population may fall with 95% probability.

Tolerance Intervals of the Sample

Let’s look at a different question that is unrelated to any prediction. Let’s start with the question again (because all intuitions are hidden in asking the right question):

What proportion of sample data falls within 95% of variability?

Here we want to know all the data points that fall under 95% of the sample variance. This notion is called the Tolerance Interval.

A tolerance interval reflects the spread of values around the average. The sampling error and the dispersion of values in the entire population determine the widths of these ranges.

Unlike a confidence Interval which asks a question about a statistical parameter (like a mean), the tolerance interval just wants to know the proportion of data that is spread around that mean. This is an important distinction to understand.

Another key insight is that as the sample size approaches the whole population, tolerance intervals don’t converge on a zero-width (again, unlike CI). Instead, they converge on the actual width of the population associated with the percentage you specify.

We can calculate the tolerance interval with a simple equation as follows:

It is important to note that there is no ŷ here. The equation works on the observed value y in the sample and NOT the predicted value ŷ.

def tolerance_interval_estimator_upperbound(y_i, z):
    return y_sample_mean + (z * np.std(sample_y))def tolerance_interval_estimator_lowerbound(y_i, z):
    return y_sample_mean - (z * np.std(sample_y))TI_upper_interval = np.vectorize(tolerance_interval_estimator_upperbound)
TI_lower_interval = np.vectorize(tolerance_interval_estimator_lowerbound)
TI_upper_line_90 = TI_upper_interval(sample_y, 1.645)
TI_lower_line_90 = TI_lower_interval(sample_y, 1.645)
TI_upper_line_95 = TI_upper_interval(sample_y, 1.96)
TI_lower_line_95 = TI_lower_interval(sample_y, 1.96)plt.figure(figsize=(8, 8), dpi=80)
plt.scatter(sample_x, sample_y)
plt.plot(sample_x, TI_upper_line_90, color="magenta")
plt.plot(sample_x, TI_lower_line_90, color="magenta")
plt.plot(sample_x, TI_upper_line_95, color="green")
plt.plot(sample_x, TI_lower_line_95, color="green")
plt.show()

Fig 9–90% and 95% Tolerance Intervals on Sample

Here I draw the tolerance intervals for 90% tolerance interval (magenta) and 95% tolerance interval (green) to show the distinction. As your tolerance drops, the line gets narrower.

Prediction Intervals — Tolerance for the Prediction?

Now we are in the end game. What if we want to know:

What proportions of my prediction will fall within 95% around the regression line?

There has been quite a huge debate on how to infer the prediction interval. Many people try to ask this question for a point prediction. They look at the prediction interval as an interval within which a specific prediction can fall for a given independent variable.

I'm not too fond of the above line of inference, personally. Because, It gets confusing to think that given an independent variable x, your 95% prediction interval for a single point prediction is that wide. Huh? My accuracy scores are better than this, no?

Hence, I urge you not to think of prediction intervals as confidence intervals (they are not). Instead, think of them as tolerance intervals (which is precisely what they are). Many definitions and youtube explanations have messed this up majorly.

Even though the prediction interval is for a point prediction, it is best to build an intuition about this interval as the “tolerance interval” for all future predictions based on current observations.

Again, remember that the confidence interval is only for a statistical parameter such as a mean or a standard deviation. To reiterate, CI wants to know if the regression line we drew is closest to the population mean and, if not, how close it is to the true conditional mean, with 95% probability (95% confidence interval).

A prediction interval estimates a “tolerance interval” in which future observations will fall, with a certain probability, given what has already been observed. So given our inference, this should look similar to a tolerance interval of the sample. Let’s check.

The equation to compute Prediction Interval:

Fig 10–Prediction Interval for Regression

Also, remember that accuracy of the prediction is a completely different concept that is established with your loss function, such as MSE or Standard Error. Do not confuse the Prediction Interval with the Prediction Accuracy.

def prediction_interval_estimator_upperbound(x_i, y_h):
    return y_h + (1.96 * np.std(sample_y) * np.sqrt(1 + (1/sample_x.shape[0]) + np.square(x_i - x_sample_mean)/SSxx))def prediction_interval_estimator_lowerbound(x_i, y_h):
    return y_h - (1.96 * np.std(sample_y) * np.sqrt(1 + (1/sample_x.shape[0]) + np.square(x_i - x_sample_mean)/SSxx))PI_upper_interval = np.vectorize(prediction_interval_estimator_upperbound)
PI_lower_interval = np.vectorize(prediction_interval_estimator_lowerbound)
PI_upper_line = PI_upper_interval(sample_x,regression_line)
PI_lower_line = PI_lower_interval(sample_x,regression_line)plt.figure(figsize=(8, 8), dpi=80)
plt.scatter(sample_x, sample_y)
plt.plot(sample_x, regression_line, color="r")
plt.plot(sample_x, PI_upper_line, color="blue")
plt.plot(sample_x, PI_lower_line, color="blue")
plt.show()

Fig 11 — Prediction Interval for Regression

Boom! Our intuition to look at prediction interval like a tolerance interval instead of a confidence interval checked out fine. The blue lines of the prediction interval look similar to the green lines of the tolerance interval of the overall sample, except that the blue lines are aligned (parallel) with the regression line.

Let’s now combine and look at all the Intervals on a single graph.

Note that the blue line is similar to the green line. Both are 95% tolerance intervals, technically. The green line is a tolerance interval for the overall sample, while the blue is the prediction interval for the regression.

The red line is the regression that helps build an inference for the prediction interval for future forecasts. While the grey line is the confidence interval of how close the conditional mean of the regression on the sample is to a probable regression that we could have drawn on the population (provided we had the population).

Note that even if you have a population, it is time and compute intensive to crunch datapoints on the overall population. Instead, you take a sample of 99% confidence interval with 5% margin of error (or 1% if you want it strict) and compute your statistics from there, which is close enough.

In conclusion, the questions we asked are as follows:

Given the sample mean, what is the 95% confidence interval within which the true population mean fall around the estimated sample mean?
What proportion of sample data falls within 95% of variability?
What proportions of my prediction will fall within 95% of the regression line?

By now, you should be able to develop an inference to answer each of these. Let me know if you want further clarity in the comments.

Confidence, Tolerance, and Prediction Intervals for Statistical Forecasts.

Preamble

Confidence Intervals of a Prediction

Tolerance Intervals of the Sample

Written by Freedom Preetham