Hypothesis Testing | All a beginner needs to know
Hypothesis in terms of statistics is an assumption for an event, a proportion based on reasoning. Hypothesis testing is a statistical method used to make decisions using experimental data. Basically, we are assuming a result using some parameters for a problem statement.
Didn’t get it? OK…
Suppose you run a Pharmaceutical company, and you have launched a drug which is in the market for quite a long time. Now you want to know how many percentages of the Indian population use this drug when they have related diseases to forecast the drug's future production. The first thing that will come to your mind is to survey India's whole population and get the details from them.
This is quite an impractical thing to do. The job will be tedious and difficult. So what to do?
Now, you started taking small groups of the population throughout India, considering many factors. These small groups of people are called samples. According to existing data and sales, you have an assumption that 20% of the Indian population is using the drug, and now you need to prove the assumption using those samples(generally consists of 10% of the total population and more than 30 samples should be taken). This assumption on drugs based on reasoning is called Hypothesis, and the method to test this hypothesis is called Hypothesis testing.
Below I have drawn a flow chart to demonstrate how hypothesis testing is done?
Types of Hypothesis:
There are various types of hypotheses present, such as simple, complex, logical, statistical, etc. In general, there are two types.
- Null Hypothesis(Ho): It states there is no relationship between two variables/samples or lack of information to state a scientific hypothesis. It is assumed to be true unless proven otherwise.
- Alternative Hypothesis(H1): Contrary to the null hypothesis, the alternative hypothesis shows that observations result from a real effect. It is an attempt to disprove a null hypothesis when we get enough evidence to reject it. It is also referred to as the research hypothesis.
Now come back to the above example. Suppose you launch a new drug similar to the existing drug, and you are assuming that this will also be used by 20% of Indians based on the earlier observations. This means there won't be any change in the sales of both new and old drugs. This is called the null hypothesis(Ho). If the result comes above or lower than 20%, then the null hypothesis will be rejected, and an alternative hypothesis(H1) will be accepted as it is going to effective than the other one or not effective. These hypotheses are called mathematical alternatives and can be true one at a time.
But how to determine the null or alternative hypothesis? For that, we need to perform some hypothesis test or test statistic.
Test Statistics:
A test statistic is a statistic used in hypothesis testing. This helps in deciding to support or reject the null hypothesis. There are various types of test statistics present. Such as,
Here we are going to cover only the z-test and t-test.
z-test:
z-test is a type of inferential statistics used to determine whether a population mean and a sample mean are different or similar when the population's variance is known. The z-test should be when,
- The sample size is greater than 30. Because according to Central Limit Theorem, when the number of samples increases, their mean distribution becomes similar to normal distribution.
- The data is normally distributed.
- Data points are independent of each other.
- Samples are independent of each other.
- There is an equally likely chance for each data points to be picked up to a sample.
Note: If the variance of the population is unknown, the assumption of the sample variance equaling the population variance is made.
To calculate z-score for a population, we need sample mean(µ), sample s.d. (σ), the population mean(x), and the number of samples(generally >30) as shown below.
Note:The z-score tells you how far, in standard deviations, a data point is from the mean or average of a data set.
Types of z-test:
There are two types of the z-tests present. a) one-sample z test b) two-sample z test
a) one-sample z test- This is used to test whether the mean of a population is greater than, less than, or not equal to a specific value. For example, to test the hypothesis where the new drugs will be used by less or more than 20% of Indians will be decided by a one-sample z-test.
b) two-sample z test- This is used to check whether the sample means of two different samples are equal or different of two independent populations. For example, say we found the mean of use of drugs for sample one is µ1 and for another sample is µ2. Then we can formulate the hypothesis such as,
t-test:
Like z-test, a t-test is also used to determine the difference between a sample and a population but most useful when determining the statistical difference between two independent sample groups but with unknown variance. The t-test should be used when,
- The variance of the population is known.
- The sample size is less than 30.
- The mean of the sample distributions follows a normal distribution.
- The variance of each sample is homogeneous, or the standard deviations of samples are approximately equal.
The formula to calculate t-value is,
Note: t-test works well when the number of samples are less than 30.
Types of t-test:
There are three types of z-tests present. a) one-sample t-test b) paired sample t-test c) Independent Samples t-test
Note: The intuition behind one-sample t-test and paired sample t-test is the same as types of z-test.Independent Samples t-test compares the means for two completely different groups.
Significance level(α):
We formulated the null and alternative hypothesis, and we obtained the z-score/t-score using the respective test statistic. But to reject or accept a hypothesis, how will we determine that our scores are statistically significant? We need some evidence to accept or reject the hypothesis. Here the level of significance(α) comes into the picture.
The significance level(α) is a measure of how strong the sample evidence must be before determining the statistically significant results. The opposite of the significance level is called the Confidence level(C).
Generally, a significance level is given as 5%(0.05) or 1%(0.01), making the confidence level 95% or 99%. A significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference.
The above picture clearly shows that for a 1% level of significance, if the test statistic lies within the 99% confidence interval, then we’ll accept the null hypothesis(Ho) else will select the alternative hypothesis(H1).
Let’s look into another example where the level of significance(α) is 5%. The region inside which the null hypothesis is going to be accepted is called Acceptance Region. The region where it is going to be rejected is called Critical Region /Rejection Region. The point where the level of significance separating both the region is called Critical value.
Note: For a normally distributed sample, the critical value(z-score/t-score) for a 5% and 1% level of significance(2.5%,0.5% on both sides of curve) is ±1.96 and ±2.58.
Problem1:
A random sample of 50 items gives the mean(µ) 6.2 units and standard deviation(σ) 10.24 units. Can it be regarded as drawn from a normally distributed population with a mean of 5.4 units and a significance level(α) of 5%?
Step 1: Form a hypothesis.
Step 2: Select an appropriate test statistic and calculate the score.
As n=50(>30), we are going to select z-test to calculate z-score. So,
Step 3: Obtain the critical value.
For a 5% level of significance, the critical value/z-score will be ±1.96.
Step 4: Compare values and conclude.
As we see, 1.77 < 1.96, which means our z-score lies within the acceptance region. This means our null hypothesis retains. Accept Ho and Reject H1.
Above, we are considering both sides of the distribution. What if we select only one side? What will be the significance of the test? Let’s understand...
One-tailed test and two-tailed test:
These are the two types of hypothesis tests based on the alternative hypothesis(H1).
One-tailed test:
This test is also called directional because we can test the effect in only one direction. When we perform a one-tailed test, the entire significance level percentage goes into the end of one tail of the distribution.
Here the entire 5% region is either on the left side or the right side.
two-tailed test:
This test is also called non-directional because we can test the effect in both the direction. When we perform a two-tailed test, half of the significance level percentage goes to either side of the distribution.
Here the 2.5% region is on the left side, and 2.5% of the region on the right side.
p-value:
Let’s consider the Problem1. There we calculated the z-score and compared it with the critical value. We found out that the z-score is less than the critical value, and we accept the null hypothesis. But what is the likelihood that the null hypothesis is true? We need a probabilistic value to quantify our likelihood, and this can be done using the p-value.
The p-value is the probability that., if the null hypothesis were true, sampling variation would produce an estimate that is further away from the hypothesized value.
or…
The p-value tells us how likely it is to get a result like this if the null hypothesis is true.
Q. But how do we know if a p-value is statistically significant?
Ans: The level of statistical significance is often expressed as a p-value between 0 to 1. The small the p-value, the stronger the evidence that you should reject the null hypothesis. A p-value less than 0.05(typically <0.05) is statistically significant. It indicates strong evidence against the null hypothesis as there is less than a 5% probability that the null hypothesis is correct. However, it doesn’t mean a 95% probability that the alternative hypothesis is true.
A p-value higher than 0.05 (> 0.05) is not statistically significant and indicates strong evidence for the null hypothesis. This means we retain the null hypothesis and reject the alternative hypothesis. You should note that you cannot accept the null hypothesis; we can only reject the null or fail to reject it.
Problem2:
Suppose a Pharmaceutical company manufactures an anti-allergic antibiotic. They need to perform some quality assurance to ensure that they have the correct dosage, which is supposed to be 500mg. In a random sample of 125 antibiotics, there is an average dose of 499.3mg with a standard deviation of 6mg. What is the likelihood that the antibiotics will contain 500mg of dosage?
Ans: From the above problem, we can state that this is a two-tailed test as dosage having less than 500mg will be less affected, and having more than 500mg may create side effects. So need to make sure that the dosage is equal to 500mg.
Ho: µ = 500mg, H1:µ ≠500mg. The number of samples is high. So selecting z-test to determine test statistic.
As this is a two-tailed test, we have z = ±1.304 on both sides. Using the z-table, we can determine the probability value(p-value)for the corresponding z value, which is 0.0968=9.68%. If we calculate on both sides, then the total p-value would be 0.1936 or 19.36%.
The p-value of 0.1936 is very high, and we can conclude that the null hypothesis is true, and we are pretty much confident about this.
Types of Error:
Statistical hypothesis testing implies that no test is ever 100% certain: that’s because we rely on probabilities to experiment. Since the observations are chosen randomly, there is a chance that we may commit a mistake while deciding on accepting or rejecting the null hypothesis(Ho). Even though hypothesis tests are meant to be reliable, there are two types of errors that can occur.
These errors are known as type 1 and type 2 errors.
Type I errors:
Type I errors happen when the null hypothesis is true but rejected somehow. This is also called as false positive. It happened when we overestimated the effect by chance. It was bad luck.his type of error doesn’t indicate that the researchers did anything wrong. The experimental design, data collection, data validity, and statistical analysis can all be correct, yet this error still occurs.
Even though we don’t know which studies have false-positive results, we know their rate of occurrence. The rate of occurrence for Type I errors equals the hypothesis test's significance level, which is also known as alpha (α). That means a 95% confidence level test means a 5% chance of getting a Type I errors.
Type II errors:
Type II errors happen when the null hypothesis is false, and you subsequently accept it. When you perform a hypothesis test, and your p-value is greater than your significance level, your results are not statistically significant. That’s disappointing because your sample provides insufficient evidence for concluding that the effect you’re studying exists in the population.
However, there is a chance that the effect is present in the population even though the test results don’t support it. If that’s the case, you’ve just experienced a Type II error. The probability of making a Type II error is known as beta (β).
Prioritizing the errors in a test is completely depends on the problem which we are solving. Suppose a pool is highly contaminated with chlorine, and people are complaining about their burning skins and eyes. Now you closed the pool and did some retesting to remove the chlorine effect. After that, you took a sample of water and ran a test to check the chlorine amount, whether it is suitable for people or not. Now four conditions occur. The pool is still contaminated, and you remain closed the pool, or the pool is now fine for public use, and you opened it. There still a chance of the test going wrong. Suppose your contamination test came negative(Condition 1), but it is still present. Based on the test, you opened the pool. Similarly, the contamination test came positive(Condition 2), but the effect is no more present. Based on the test, your pool is still closed.
Our null hypothesis was that there is no contamination. That means condition 1 is a Type I error, and condition 2 is a Type II error. In this case, prioritizing the Type II error won’t harm anything but Type I error can lead to serious consequences.
That’s all, folks. Please do visit my other blogs. I have written many blogs on statistical analysis in machine learning.