Introduction of Confidence Interval

Irfan Rahman
Beginner’s Guide for Data Science
5 min readJan 26, 2019

As we know in statistics the Population Parameter is unknowable. Now in this post we will learn the technique how we can estimate the true population mean of our population based on our sample. This is part of our inferential statistics where we trying to judge population based on our sample.

Before going to techniques let’s first understand why confidence Interval is required. Let’s say you have population which contains all the cats of the world. And we are interested to know average weight of all the cats in the world. As i said true population mean is unknowable and there is no way we can find the actual value of all the cats of the world. So in such situation confidence interval helped to estimates our population mean between some range or interval based on confidence level.

So, what i am going to do is to sample or take some samples in my population to give better judgment of average weight of all the cats from the population.

Let’s say in first sample i took 30 cats and i weighted them i found the average of my first sample is 3.2 kg. Again i took second sample let say of 250 cats and weighted them and i found the average of of my second sample is 3.5 kg. Again i took one more sample from population let say 120 cats and this time sample mean or average is 4.2 kg.

Now if you will notice we are basically estimating or trying to find out estimated value of the population mean based on our sample.

Now in statistics we will learn the techniques or some techniques how to find the range of values that will capture the true value of the population mean by using our sample.

Confidence Interval

If you notice the above confidence interval example, the middle line is called true population mean and as i said we cant’t find or we are not able to get the exact value for true population mean. However we can estimate the lower limit and upper limit and this is called confidence interval. This confidence interval is something we can compute using some statistical formula and this is what we are going to find that two value i.e. lower limit and upper limit in this post.

Calculate Confidence Interval for Population mean(μ)

let’ see the below example to calculate confidence interval-:

From a normally distributed population, we took an Simple Random Sample(SRS) of 500 students with a mean score of 461 on the math section of SAT. Suppose the standard deviation of the population is 100, what is the estimated true population mean for the 95% confidence interval.

So, the first step to compute the confidence interval is to organize the given values. Now in the above given problem we are given the sample size i.e. 500 and we should know it is a sample size because in the problem it’s from the simple random sample.

Now we have to verify all the step to compute Confidence interval

Step 1- Organize the data

  • Sample size n = 500
  • Sample mean X bar = 461
  • Confidence level C = 95%
  • Standard deviation σ = 100

Step 2 - Should satisfy below conditions

  • Population should be normally distributed
  • Sample should be randomly selected

Step 3 - Calculate z value based on confidence level

  • Calculate z-value i.e. (1- C)/2 = (1- 0.95)/2 = 0.025, use z-table you will get z(95)-value for 95% confidence level = 1.960

Step 4 -

Confidence Interval for above problem

As you can see above graph is for normally distributed and we are working on confidence level as 95%. So, the width our distribution is about 95%.

given that C = 95, n = 500, X bar = 461, σ = 100, z = 1.96

formula to calculate-

X-bar +/- z(σ/sqrt(n)

461 +/- 1.96(100/sqrt(500)

461 +/- 8.765, here 8.765 is called margin of error

To find the range of values you just have to add and subtract 8.765 from 461 and you will get (452.23, 469.77)

Which means the true population mean will be captured by our interval i.e. 452.23 and 469.77.

So, to summarize what we actually did “we are 95% confident that the true population mean score on the math section of the SAT lies between 452.23 point and 469.77 points”.

Calculate Confidence Interval for Population Proportion

To calculate population proportion it’s pretty much similar how we calculated population mean. The major difference is the condition that we have to satisfied.

Let’s see the example to calculate confidence interval-:

The Harvard school of public health survey found that 2486 of a sample of 10,904 college undergraduates said they had engaged binge drinking. From the randomly selected sample, estimate the true proportion of college students who engage in binge drinking at 99% confidence level.

Now we have to follow some steps-

Step 1- Organize the data

  • Sample size n = 10904
  • P-hat(sample proportion) = 2486/10904 ~= 0.228
  • q(compliment of p hat)= 1– 0.228 = 0.772

Step 2- Should satisfy below condition

  • Sample is randomly selected
  • np > 10 = 10904*0.228 = 2,486 > 10 satisfied
  • nq > 10 = 10904*0.772 = 8,917 > 10 satisfied
  • N > 10(n) = 10(10904) = N > all undergraduates student satisfied

Step 3- Calculate z value based on confidence level

  • Calculate z-value i.e. (1- C)/2 = (1- 0.99)/2 = 0.025, use z-table you will get z(99)-value for 99% confidence level = 2.576

Step 4-

0.228 +/- 2.576(sqrt((0.228)*(0.772)/10904)

0.228 +/- 2.576(0.0040)

0.228 +/- 0.0103, here 0.0103 is called margin error

0.2176,0.2383 or 21.76%,23.83%

So, to summarize what we actually did “with the repeated sample of 11k under graduate students, we are 99% confident that the true proportion of college student who engage in binge drinking is between 22% and 24%”.

Summary

In this post we learn about confidence interval, why we use confidence interval, we also calculated confidence interval for true population mean as well as for population proportion

If you think my post helped you by any way do clap, and don’t forget to follow me for future post. Thanks…

--

--