Inferential Statistics — Hypothesis Testing
You might have come across innumerable claims and statements involving numbers, especially in marketing campaigns and ads. “9 out of 10 doctors recommend Colgate toothpaste”, or “Dettol kills 99.9% of bacteria” are classic examples of numerical claims. The statistical validity of such statements regarding a parameter can be tested if you collect some sample data and perform certain calculations on it. This is the foundation of inferential statistics using hypothesis testing.
Hypothesis testing is a statistical testing mechanism to evaluate the evidence in the form of data to validate the hypothesis or assumption regarding a parameter. You start with two hypotheses — null and alternate, taking the two opposing sides to explain the parameter.
For example, if your car manufacturer claims that the car gives a mileage of 25 km per liter, first collect the mileage of multiple cars of the same model. This will form your sample data. Then, set the null and alternate hypotheses. Perform the test calculations and based on the results, evaluate the validity of the claim.
Setting of Hypothesis Testing
Null Hypothesis — Default position or the as-is state where the claim about a parameter is accepted as the truth
Alternate Hypothesis — Opposite of null hypothesis where the claim in null is contradicted and where the observed data is considered as the truth
Hypothesis testing is performed in order to either reject or not reject the null hypothesis based on the calculations done on the observed (collected) data. If the test rejects the null, then automatically one accepts the alternate position. On the other hand, if the test fails to reject the null, then we continue with the default position as status quo until we collect further evidence.
Let’s consider the following problem statement.
An incandescent light bulb manufacturer claims that the mean lifetime of a bulb is 2,000 hours or more. In a sample of 30 bulbs, it was found that they only last 1,900 hours on average. The sample standard deviation is 150 hours. At .05 significance level, can we reject the claim by the manufacturer?
Now, let us formulate the null and alternate hypothesis for this scenario.
Ho : Average lifetime of the bulb ≥ 2000 hours(claim)
Ha : Average lifetime of the bulb < 2000 hours (as per the sample data)
The next step is to perform the statistical test to assess whether we have to reject or not reject the null or the as-is state based on the data. Through the means of test calculations, we evaluate whether the evidence from the data is strong enough to disprove the null state and accept the alternate position. If there is no strong evidence, we continue to accept the claim.
The t-score is a test statistic for t-tests that measures the difference between an observed sample statistic and its hypothesized population parameter in units of standard error. A t-test compares the observed t-value to a critical value on the t-distribution with (n-1) degrees of freedom to determine whether the difference between the estimated and hypothesized values of the population parameter is statistically significant.
In this case, we will perform a t-test and calculate the associated p-value to make a decision. The t-test statistic is calculated as (x-μ)/(s/√n), where x is the sample mean, μ is the hypothesized or the claimed mean, s is the sample standard deviation and n is the sample size.
In the above example, the t-test statistic value will come out to be -3.65. In order to make a decision at 0.05 significance level (corresponding to 95% confidence level), the threshold value based on the t-distribution is -1.699. This can easily be obtained from the t-distribution table or by a quick formula in excel as T.INV(significance value [usually 0.05 or 0.1], sample size — 1).
Making the decision
The final step is to compare the test-statistic against the cut off value. If the abs(test statistic) > abs(cut-off value), then we reject the null hypothesis, else we do not. The purpose of the cut-off value is to determine how large a deviation from the claimed value (null hypothesis) will be sufficient to reasonably reject the null state. Typical industry standards are 95% and 90% significance levels, indicating a probability of error of 5% and 10% respectively. In a distribution with the hypothesized value as the average point, if the observed value from data is sufficiently far away from claimed value, then it will mean that it is unlikely to observe such data if the null-state was to be true. This forms the basis for the rejection of null. If the observed value is not too far from the claimed value, then we do not reject the claim.
In our example, the absolute value of test statistic is 3.65 which is greater than the cut-off value of 1.699. Hence, we reject the claim by the manufacturer that the average lifetime of a bulb is 2000 hours or more.
Here’s a quick guide to choose the type of test to perform based on the information available at your disposal.
Errors in Hypothesis Testing
There are two types of errors associated with hypothesis testing — Type 1 and Type 2 errors.
The type 1 error occurs if we incorrectly reject the null hypothesis when in reality it was not to be rejected. In the previous example, if the average lifetime of the bulb was indeed 2000 hours or more as claimed by the manufacturer, but the sample we collected contained more number of defective pieces than usual, then we would have committed a type 1 error. α is the probability of committing type 1 error, which is the significance level we choose for the test. If we go for a 0.05 significance level, it means that there is a 5% probability of type 1 error to happen in the test outcome.
The type 2 error occurs when we fail to reject the null hypothesis when in reality we should have. β is the probability of committing a type 2 error. This is also called the power of the hypothesis test, as this determines the ability of the test to assess the strength of evidence in data to reject the null state.
The values and importance of the errors are determined by the nature, complexity and context of the problem you are solving. In the case of medical tests, it is more important to not commit type 2 error wherein an actual patient does not get diagnosed with the ailment. It is okay even if there are false positives, but false negatives pose a serious threat in this scenario. On the other hand, in matters of adjudication, it is necessary to attain a low type 1 error where an innocent person gets convicted. Although it is equally important to have low type 2 error (guilty being judged innocent), type 1 error gets more importance because of the judiciary policy of ‘innocent until proven guilty’.
Thus, depending on the context, the objective of the hypothesis testing and the importance of associated errors varies. Hypothesis testing is an integral part of inferential statistics, which forms the basis for a lot of advanced machine learning algorithms.