# Confidence intervals: Correct and incorrect interpretations

To better understand your data, and effectively communicate your results to stakeholders, it’s important to know how to correctly interpret a confidence interval.

In this reading, we’ll review the correct way to interpret a confidence interval. We’ll also discuss some common forms of misinterpretation and how to avoid them.

# Example: mean weight

Let’s explore an example to get a better understanding of how to interpret a confidence interval. Imagine you want to estimate the mean weight of a population of 10,000 penguins. Instead of weighing every single penguin, you select a sample of 100 penguins. The mean weight of your sample is 30 pounds. Based on your sample data, you construct a 95% confidence interval between 28 pounds and 32 pounds.

95 CI [28, 32]

# Interpret the confidence interval

Technically, 95% confidence means that if you take repeated random samples from a population, and construct a confidence interval for each sample using the same method, you can expect that 95% of these intervals will capture the population mean. You can also expect that 5% of the total will not capture the population mean.

The confidence level refers to the long-term success rate of the method, or the estimation process based on random sampling.

For the purpose of our example, let’s imagine that the mean weight of all 10,000 penguins is 31 pounds, although you wouldn’t know this unless you actually weighed every penguin. So, you take a sample of the population.

Imagine you take 20 random samples of 100 penguins each from the penguin population, and calculate a 95% confidence interval for each sample. You can expect that approximately 19 of the 20 intervals, or 95% of the total, will contain the actual population mean weight of 31 pounds. One such interval will be the range of values between 28 pounds and 32 pounds.

In practice, data professionals usually select one random sample and generate one confidence interval, which may or may not contain the actual population mean. This is because repeated random sampling is often difficult, expensive, and time-consuming. Confidence intervals give data professionals a way to quantify the uncertainty due to random sampling.

# Incorrect interpretations

Now that you have a better understanding of how to properly interpret a confidence interval, let’s review some common misinterpretations and how to avoid them.

# Misinterpretation 1: 95% refers to the probability that the population mean falls within the constructed interval

One incorrect statement that is often made about a confidence interval at a 95% level of confidence is that there is a 95% probability that the population mean falls within the constructed interval.

In our example, this would mean that there’s a 95% chance that the mean weight of the penguin population falls in the interval between 28 pounds and 32 pounds.

This is incorrect. The population mean is a constant.

Like any population parameter, the population mean is a constant, not a random variable. While the value of the sample mean varies from sample to sample, the value of the population mean does not change. The probability that a constant falls within any given range of values is always 0% or 100%. It either falls within the range of values, or it doesn’t.

For example, any given random sample of 100 penguins may have a different mean weight: 32.8 pounds, 27.3 pounds, 29.6 pounds, and so on. You can use a sampling distribution to assign a specific probability to each of your sample means because these are random variables. However, the population mean weight is considered a constant. In our example, if you weigh all 10,000 penguins, you’ll find that the population mean is 31 pounds. This value is fixed, and does not vary from sample to sample.

Sample Mean (100 penguins)

Population Mean (10,000 penguins)

32.8 lbs

31 lbs

27.3 lbs

31 lbs

29.6 lbs

31 lbs

So, it’s not strictly correct to say there is a 95% chance that your confidence interval captures the population mean because this implies that the population mean is variable. Intervals change from sample to sample, but the value of the population mean you’re trying to capture does not.

What you can say is that if you take repeated random samples from the population, and construct a confidence interval for each sample using the same method, you can expect 95% of your intervals to capture the population mean.

Pro tip: Remember that a 95% confidence level refers to the success rate of the estimation process.

# Misinterpretation 2: 95% refers to the percentage of data values that fall within the interval

Another common mistake is to interpret a 95% confidence interval as saying that 95% of all of the data values in the population fall within the interval. This is not necessarily true. A 95% confidence interval shows a range of values that likely includes the actual population mean. This is not the same as a range that contains 95% of the data values in the population.

For example, your 95% confidence interval for the mean penguin weight is between 28 pounds and 32 pounds. It may not be accurate to say that 95% of all weight values fall within this interval. It’s possible that over 5% of the penguin weights in the population are outside this interval — either less than 28 pounds or greater than 32 pounds.

# Misinterpretation 3: 95% refers to the percentage of sample means that fall within the interval

A third common misinterpretation is that a 95% confidence interval implies that 95% of all possible sample means fall within the range of the interval. This is not necessarily true. For example, your 95% confidence interval for mean penguin weight is between 28 pounds and 32 pounds. Imagine you take repeated samples of 100 penguins and calculate the mean weight for each sample. It’s possible that over 5% of your sample means will be less than 28 pounds or greater than 32 pounds.

# Key takeaways

Knowing how to correctly interpret confidence intervals will give you a better understanding of your estimate, and help you share useful and accurate information with stakeholders. You may need to explain the common misinterpretations too, and why they’re incorrect. You don’t want your stakeholders to base their decisions on a misinterpretation. Understanding how to effectively communicate your results to stakeholders is a key part of your job as a data professional.

--

--

Data Scientist unravelling insights, with a passion for photography. Capturing data and snapshots of life.