What are confidence intervals?

Understanding the maths behind the statistical concept with an example

Mehul Gupta
Data Science in your pocket
3 min readMay 25, 2023

--

Photo by Justin Morgan on Unsplash

If you have been in the Data Science space, you must have heard of the term ‘confidence intervals’, especially when you are estimating some numbers. Before we jump onto a practical example, it’s better we know a couple of statistical concepts

  • Population: All the samples that represent a certain section. For say:

If you wish to calculate the average height of an Asian male, the population here is all the Asian males.

For calculating average maths score for class 8th B students, the population here is 8th B students.

Now, if you would have noticed, gathering data for the population is easy for some cases (class 8th B might be having most say 500 students) but not for all (All Asian males must account for millions of samples). But still, we gotta do studies on these populations also where we can’t gather data for the entire population.

What should we do?

  • Samples: In cases where gathering data for the whole population is difficult, we take samples from this population (some Asian males not all) and eventually run our study on these samples.

Once we get our required stat (average height in this case) for the sample population, we assume that the average height of the population would be same as the sampel population

Something fishy? How can we estimate the average height of say 100 million folks using a sample size of just say 10k? Won’t it would be an uncertain value? The value might be roughly the same but not point-to-point accurate. Right?

Whenever we are doing such an estimation

Where the data for the whole population is not available and we are trying to estimate some stat for the population.

We are making some sort of assumptions

Alongside the estimated value, we need to attach the uncertainty factor as well how sure are we about the estimate? Confidence Intervals help in adding that uncertainty factor to the estimate we make for the population using a sample population. You must have heard statements like

“According to our study, the average height of adult males in the population is 175 cm, with a 95% confidence interval of 170 cm to 180 cm.”

“The marketing team conducted a survey to estimate customer satisfaction, and the results showed a 90% confidence interval of 75% to 80%.”

“The researchers analyzed the data and found that the effect size of the treatment was 0.25, with a 99% confidence interval ranging from 0.15 to 0.35.”

Defining more formally

Confidence Intervals help us to get to a range rather than an exact value for some stat X alongside the confidence we have on the estimated stat X. Mathematically, we are trying to convey that stat X, will lie in the x-y range z/100 times where z% is the confidence interval.

So if z=95% and range = 170–180, if you take 100 different sample populations and calculate the stat ‘X’ (say mean height), it will lie in the range 170–180, 95 times out of 100 (you are assuming 5 times, the height estimated using the sample population will go outside the given range).

Do remember that

When the population data is available, any stat you calculate is exact hence no confidence intervals are required. So when calculating mean mathematics score for students of class 8th B, you don’t need confidence intervals as the total population won’t be big hence mean can be calculated over the population itself rather than sampling it

The higher the confidence, the broader is the range. This is intuitive. Right? The broader the range, the more confident you are that the value will lie in the predicted interval.No?

Next, we will quickly understand the maths behind on how to calculate confidence interval which is quite easy.

Generate ’N’ random sample population of size say ‘R’.

Calculate the required statistic (assume it to be the average height of folks of some place X) for all ’N’ random population samples. Hence, we would now be having ’N’ values for the required statistic, one value

Calculate x=(100-z)/2. Where z=Confidence%. If it’s 95%, x=(100-z)/2=2.5

Calculate x,z+x percentile value for the average height stat we have calculated for the ’N’ sample population. This would be 2.5 and 97.5 percentile for the above problem. We will get our confidence intervals !!

That’s it for today, see you soon

--

--