Confidence Intervals Interview Questions

Tanmay Thaker
Nerd For Tech
Published in
5 min readAug 11, 2021

Confidence Interval: A confidence interval, in statistics, refers to the probability that a population parameter will fall between two set values. Confidence intervals measure the degree of uncertainty or certainty in a sampling method. A confidence interval can take any number of probabilities, with the most common being a 95% or 99% confidence level.

Calculating a Confidence Interval (Theory)

Suppose a group of researchers is studying the heights of high school basketball players. The researchers take a random sample from the population and establish a mean height of 74 inches. The mean of 74 inches is a point estimate of the population mean. A point estimate by itself is of limited usefulness because it does not reveal the uncertainty associated with the estimate; you do not have a good sense of how far away this 74-inch sample mean might be from the population mean. What’s missing is the degree of uncertainty in this single sample.

Confidence intervals provide more information than point estimates. By establishing a 95% confidence interval using the sample’s mean and standard deviation, and assuming a normal distribution as represented by the bell curve, the researchers arrive at an upper and lower bound that contains the true mean 95% of the time. Assume the interval is between 72 inches and 76 inches. If the researchers take 100 random samples from the population of high school basketball players as a whole, the mean should fall between 72 and 76 inches in 95 of those samples.

If the researchers want even greater confidence, they can expand the interval to 99% confidence. Doing so invariably creates a broader range, as it makes room for a greater number of sample means. If they establish the 99% confidence interval as being between 70 inches and 78 inches, they can expect 99 of 100 samples evaluated to contain a mean value between these numbers. A 90% confidence level means that we would expect 90% of the interval estimates to include the population parameter. Likewise, a 99% confidence level means that 95% of the intervals would include the parameter.

The Confidence Interval is based on Mean and Standard Deviation and is given as:

For n>30

Confidence interval = X ± (z * s/√n)

where z critical value is derived from the z score table based on the confidence level.

X is the sample mean.

s is sample standard deviation.

n is the sample size

We obtain these values from the z-score table only, but since the confidence levels are most of the time fixed as the above values, so we can use this table.

For n<30

Confidence interval = X ± (t * s/√n)

where t critical value is derived from the t score table based on the confidence level.

X is the sample mean.

s is sample standard deviation.

n is the sample size.

We will see how to create confidence intervals in the examples to follow.

Now that we have got all the theories behind Hypothesis testing, let’s see different types of tests that are used for testing. We have already seen examples of finding z-score and t-score, we will see how they are used in the testing scenario.

Hypothesis Testing for Large Size Samples

Thumb rule: A sample of size greater than 30 is considered a large sample and as per the central limit theorem we will assume that all sampling distributions follow a normal distribution.

We are familiar with the steps of hypothesis testing as shown earlier. We also know, from the above table, when to use which type of test.

Let’s start with few practical examples to help our understanding more. We will just use the below standardized critical value table for calculation purposes.

Q) A manufacturer of printer cartridge clams that a certain cartridge manufactured by him has a mean printing capacity of at least 500 pages. A wholesale purchaser selects a sample of 100 printers and tests them. The mean printing capacity of the sample came out to be 490 pages with a standard deviation of 30 printing pages.

Should the purchaser reject the claim of the manufacturer at a significance level of 5%?

Ans. population mean = 500

Sample mean = 490

Sample standard deviation = 30

Significance level(alpha) = 5% = 0.05

Sample size = 100

H0: Mean printing capacity >=500

H1: Mean printing capacity < 500

We can clearly see it is a one-tailed test (left tail).

Here, the sample is large with an unknown population variance. Since we don’t know about the normality of the data, we will use the Z-test (from the table above).

We will use the sample variance to calculate the critical value.

Standard error (SE) = Sample standard deviation/ (sample size) * 0.5

= 30 / (100) *0.5 = 3

Z(test) = (Sample mean — population mean)/ (SE)

= (490–500)/3 = -3.33

Let’s find out the critical value at the 5% significance level using the above Critical value table.

Z (0.05%) = — 1.645 (since it is left tailed test).

We can clearly see that Z(test) < Z (0.05%), which means our test value lies in the rejection region.

Thus, we can reject the null hypothesis i.e. the manufacturer’s claim at a 5% significance level.

Using p-value to test the above hypothesis:

p-value = P[T<=-3.33] (we know p(-x) = 1 -p(x) also, remember that the p(x) represents the

cumulative probability from 0 to x)

let’s use z-table to find the p-value:

p-value = 1–0.9996 = 0.0004

Here, the p-value is less than the significance level of 5%. So, we are right to reject the null hypothesis.

--

--

Tanmay Thaker
Nerd For Tech

Software Engineer (Machine Learning) | Passionate about Machine Learning and Artificial Intelligence