Chandler: Hi Joey, you look really good today. Any occasion?
Joey: Yes buddy, it’s my birthday today.
Chandler: Great, happy birthday! So, where are we going for the treat?
Joey: Anywhere as per your wish, I want to do something new this year buddy. I don’t want to regret next year for not doing something I wished for. I am going to make myself happy.
Chandler: Do you want me to leave the room?
Joey: ☺☺ No no… I want to strengthen my knowledge in statistics and machine learning this year.
Chandler: I get it buddy.
Joey: I was doing some research on hypothesis testing and in some places people are using confidence interval to do hypothesis testing. It is another approach for hypothesis testing?
Chandler: Yes, you can use confidence interval to do hypothesis testing. But, the main objective of the confidence interval estimation is to find an interval which will contain the population parameter with certain amount of confidence.
Joey: Can you elaborate?
Chandler: Oh yes. Let me break down the above sentence and make it simpler. We knew that μ is a population mean (or population parameter). Usually, we don’t know this value in a real life. So, we need to find a good approximation for the population mean by collecting a sample. It is called as sample mean (x). It is also called as point estimate because a single value given as an estimate for the population parameter. We also knew that if we take another sample, we will get a different value for this sample mean. It arise a question of whether can we find a better estimate instead of a point estimate?
Joey: I am not sure.
Chandler: Yes, it is possible. It is called as the interval estimate. It gives a range (minimum and maximum value). The interval can be chosen with various confidence levels (90%, 95% and 99%). The confidence level decides the width of the interval. Higher the confidence level, larger will be the range of the confidence interval. We can conclude that our confidence interval will contain the population parameter (μ) a specified proportion of time. The specified proportion of time will be quantified by the confidence level.
Joey: Are you saying that 95% (assumed that confidence level is 95%) of time our confidence interval will contain the population parameter?
Chandler: No, actually most of them get the same doubt. The second popular topic after p-value in which people get confused is to make the right interpretation of confidence intervals. Let me explain the correct interpretation.
If you sample 100 times (meaning we collect 100 samples and each with size of n) and find a confidence interval for each sample, you will have 100 confidence intervals. Assume that our confidence level for all the confidence intervals is 95%. The correct interpretation for 95% confidence interval is that out of 100 confidence intervals we built, 95 will contain the population the parameter.
Joey: Are you trying to say that our confidence interval may or may not contain the population mean? But, we can believe that there is a 95% chance that our confidence interval will contain the population parameter.
Chandler: Exactly. Now can you relate confidence intervals with hypothesis testing?
Joey: Okay. Let me give a try.
- We try to build a confidence interval for the hypothesized mean H0 using the sample we collected.
- If the hypothesized mean is in the interval, we will fail to reject the null hypothesis.
- If the hypothesized mean is not in the interval, we will reject the null hypothesis.
Am I right?
Chandler: Yes joey. You are absolutely correct.
Joey: You told me that we require confidence level to calculate a confidence interval. What is the relation between the confidence level and the hypothesis testing?
Chandler: Confidence level is a function of Type I error. It is given by
Where, α is the significance level.
Let me make it clear by giving the formula for finding the confidence intervals. The confidence interval is also function of type of test (Z-test, t-test). The confidence interval in a Z test is given by
Everything in the above equation is obvious and easy to calculate except Z(α/2).
Z(α/2) = Gives the corresponding Z value for a given (α/2). We can calculate this value from the standard normal table.
Joey: Is there any other types of confidence interval estimation?
Chandler: Yes. In fact below classification is applicable for hypothesis testing too
Let me explain the differences through some examples:
One sample test
Two sample test
Average engine heat level should be less than 210F according to manufacturing specification H0: μ ≤ 210F
Trying two different temperatures in a foundry process to see if the mean number of defects decreases H0: μ1 = μ2
Fluoride content in the paste should be equal to 5 per cent H0: μ = 5
You need to decide between two different methods of resolving a complaint. Is method A is better than method B? H0: μ1 ≥ μ2
Joey: It means that we need to use different formula to estimate confidence interval for two sample test.
Chandler: Exactly, confidence interval for two sample test is given by
Joey: Okay Chandler can you explain confidence interval estimation for our case?
Chandler: Sure Joey, in our case we deal with just one sample and our hypothesised mean is 30 clicks/day when the colour of the website is changed.
The author of this blog is Balaji P who is pursuing PhD in reinforcement learning at IIT Madras