Type I and Type II errors in Testing of Hypothesis
In Testing of Hypothesis, there is a conjecture, under which the probability distribution function of test-statistics, say T, is known. This conjecture is called null hypothesis and denoted by H0. The test-statistics T is computed based on the observations O = { O1,O2, O3 … On } taken from the universe U = { U1, U2, U3, U4, … UN … }. Here sample size is n, and population size is N or infinity depending upon if population is finite or infinite. For a null hypothesis H0, the test-statistics T represents a summary, for all the observations taken together. This T is generally a scalar function of all the observations. We often take simple null hypothesis. Simple null hypotheses are point hypotheses. Such a null hypothesis makes test-statistics T to follow only one probability distribution function with a single value of parameter. This single value of parameter is conjectured to be tested. Example H0: μ = μ0 that means null hypothesis is "population mean is μ0."
Testing
To test the null hypothesis H0, we search for an alternative hypothesis, say Ha. An alternative hypothesis, Ha, is a competing hypothesis with respect to null hypothesis such that rejecting null hypothesis will imply that the alternative hypothesis is supported.
The researcher is interested either in the null hypothesis or in alternative hypothesis depending on need or the experiment. Sometimes he is more interested in alternative hypothesis like out of the two medicines, the first one has greater effectiveness. The other times, he is more interested in null hypothesis like the mean height of adult male in a city is 5 feet, 3 inches. Whatsoever hypothesis he is interested in, he will search a competing hypothesis, so that he could test his conjecture against it in terms of probability.
The hypothesis testing only tells if a researcher can reject a null hypothesis within a level of significance. Once he is not able to reject, he will support the alternative hypothesis.
Composite Hypothesis
The null hypothesis could be composite also. In the composite hypothesis, the conjecture specifies more than one probability distribution function for the test-statistics T. The parameter values are a set of values like in the example H0: μ ≥ μ0. Here, it means that the null hypothesis is "population mean is greater than μ0." In this case, there are an infinite number of values of population parameter μ. To test such a hypothesis, not only do we need competing alternative hypothesis H1 : μ < μ0 , but we also need a least favourable hypothesis amongst the null hypothesis parameters. These are generally at the border of composite hypothesis and alternative hypothesis such that rejecting it means that all other parameters of null hypothesis are rejected. Naturally in this case, the least favourable null hypothesis H'0 would be a simple hypothesis H'0: μ = μ0.
Type I Error
A Type I error means rejecting the null hypothesis when it’s actually true. It means concluding that results are statistically significant when, in reality, it is not so, it just happened purely by chance or because of unrelated factors.
Type II Error
A Type II error means not able to reject the null hypothesis when it’s actually false. It means concluding that results are statistically insignificant when, in reality, they came about purely by many underlying factors related to the design of experiments or relative positions of factors that are influencing the observations.
False positive and False negative
For a Type I error we incorrectly reject the null hypothesis—in other words, our statistical test falsely provides positive evidence in favour of the alternative hypothesis. Therefore, a Type I error corresponds to a “false positive” test result. To illustrate the concept of false positive and false negative, we take the following example.
Let us take the case of diagnosing Covid 19, the null hypothesis H0 would be that the person is fit and normal and the alternative hypothesis, H1, will be s/he is suffering from the disease. A false positive means that the diagnostic test finds that person is suffering from disease, but actually s/he is not suffering from it. The diagnostic test is falsely positive for disease. So the statistical test will be having type I error as the null hypothesis is incorrectly rejected.
On the other hand, a Type II error occurs when we incorrectly retain the null hypothesis. In other words, our statistical test falsely provides negative evidence for the alternative hypothesis. Therefore, a Type II error corresponds to a “false negative” test result.
In the above example of diagnosing Covid 19, let the test incorrectly provides evidence in favour of the null hypothesis that the person is fit and normal. This is false negative because the diagnostic test shows negative evidence for disease, but actually this fact is false. Under such a situation, the person is diagnosed as not suffering from disease, but actually he is suffering from it. So the statistical test will be having type II error as the null hypothesis is incorrectly retained.
One can feel that for the above type of example of diagnostic test, a researcher is more concerned with alternative hypothesis than the null hypothesis. The diagnostic tests provide value(s) of indicators, or evidence about the status of the experimental unit (here person). These evidences are the observations, on which statistical test-statistics is built. While type I or type II corresponds to hypotheses H0 & H1 in question, the terms "false positive" and "false negative" correspond to these evidences or indicators (test-statistics).
Errors of Commission and Errors of Omission
Type I are errors of Commission, while Type II are errors of Omission. This often said statement has its validity because we always choose a null hypothesis in a particular way. The null hypothesis is chosen as one, which is simple, single parametric, having normal characteristic values, having some equal parametric values, and also a known distribution function.
To comprehend the concept in a better way, let us take an example of convicting a person for crime, the null hypothesis H0 would be that the person is normal, and not guilty and the alternative statement or conjecture will be that s/he is criminal.
Rejecting the null hypothesis is rejecting a normal set-up as against an abnormal thing. Abnormal, means he is criminal, is the alternative hypothesis. Type I error is rejecting normal set-up, while actually it is a normal set-up. That means observations, here investigation, are perhaps deliberately made more in proving the abnormal scenario. This could be visualised as errors of commissions. Type I error is error of commission.
If the investigators are more concerned in proving alternative hypotheses that s/he is criminal rather than relying on null hypothesis that s/he is not guilty, propensity to commit type I error or observing false positives will be more. This is because intentions in investigation are infected with errors of commission.
On the other hand, accepting the null hypothesis is accepting a normal set-up as against an abnormal set-up. Type II error is accepting normal set-up, while actually it is not a normal set-up. That means, in investigation, recording observations correctly has perhaps been overlooked, either in following correct procedures or omitting potential ones, which could make more sense in proving the abnormal scenario of s/he being criminal. This could be thought of as errors of omissions. As a result H0 is incorrectly retained. Therefore, Type II error is an error of omission.
If the investigation (taking observations) is more concerned about proving null hypotheses rather than alternative hypothesis, propensity to commit type II error or observing false negatives will be more. This is because the process of investigation is infected with errors of omission.
Reasons of Type I and Type II errors
In the above discussion of errors of Commission and Omission, reasons of type I, and type II errors are subtly described in terms of investigations that is the fault associated with the process of taking the observations on the units which are sampled from the universe. These factors can be summarised as :
Intensity of the factors effect: If the conjecture is about a factor whose effects are pronounced, then that will be detected easily. Committing Type I Error, in that case, will be very low or if it happens, it will be only by chance.
Measurement error: Systematic and random errors in recording data will increase type II errors, because there would be omissions from proper measurements.
Biased investigation: If observations are recorded with a biased approach without following objectivity in the experiment or investigation, there will be flooding of more skewed data to prove something. There will be errors of Commission and therefore, type I errors will increase.
Sample size: Smaller samples need exact probability distribution function of test-statistics for testing hypotheses. Distribution function of test-statistics for larger samples can be approximated by well known distribution functions. Moreover the larger sample size reduces sampling error. Thus increasing sample size reduces type I and type II errors.
Significance level: Significance level is a-priori thought probability of committing type I error, which researcher thinks that he can tolerate. It is denoted by alpha (α). If this α is increased then the probability of type II error (β) gets decreased and vice versa.
Rates of Type I and Type II, significance level, confidence level, power of test
In any problem of testing hypotheses, we can not completely do away with both type I and type II errors. The probabilities or rates of committing Type I and Type II errors influence each other. We will define some terms in this section, which are generally referred to frequently.
Probability of committing type I error is called alpha (α). A statistician usually fixes it at either 1% or 5% to test the hypothesis. Fixing alpha to a point say 5% is known as level of significance or significance level is at 5%.
Since alpha (α) is the probability of rejecting the null hypothesis when it is true, the one minus alpha (1- α) will be the probability of accepting the null hypothesis when it is true. The (1-α) is known as the confidence level of the test. So if the level of significance is 5%, then the confidence level will be 95% and if the level of significance or significance level is 1%, then the confidence level will be 99%.
In some literature, confidence level itself is mentioned as the significance level and there instead of saying at 95% confidence level, it is described as at 95% significance level. The concept of confidence level is avoided in such places because of the word confidence, which connotes some kind of certainty. However the concept of confidence intervals still exists in all statistical literatures.
Probability of committing type II error is denoted by beeta (β) . So β is the rate at which null hypothesis will be incorrectly retained. It means β is the probability of accepting H0 when H0 is false.
The (1- β ) , therefore, will be the probability of rejecting H0 when H0 is false or correctly rejecting H0. This (1- β ) which means correctly rejecting the null hypothesis is known as the power of test.
Thus probability of correctly accepting the null hypothesis is confidence level and correctly rejecting the null hypothesis is power of test.
We can notice that
Significance level alpha α (the type I error rate) affects statistical power (1- β), which is inversely related to the type II error rate ( β ).
Taking significance level, alpha (α), a smaller one will reduce type I error risk. However, if significance level, alpha (α), decreases, then power of test ( β ) will also decrease, thereby it will increase the type II error rate ( β ). This means there's an inverse relation between type I and type II errors.
Generally researchers set the value of alpha (α), so that probability of having type I error should not be more than that value of alpha (α). But s/he can not keep on decreasing it to a very very low value as the more s/he lower it, the more type II error risk will increase. Thus, there is a need to balance both the errors. In practice, significance level, alpha (α) is taken either 5% or 1%.
Statistical prescriptions to control risk of type I and type II errors
Though under reasons of type I and type II errors mentioned above, we have seen how the rates of type I and type II errors are influenced by various factors. Some factors are statistical factors and some are non-statistical factors. Non statistical factors are those that can be controlled by administrative actions, framing protocols, ensuring objectivity, proper supervision and better training. They will control (b) errors in measurements and (c) biased investigation. The causes that can be influenced by statistical framework are given at (d) Sample Size and (e) Significance Level.
To reduce the risk of a Type II error, and hence increase the power of test (1- β ), one can take recourse to increasing the sample size and/or the significance level. However, if significance level is increased then risk of type I error will also increase.
Needless to mention that reason, (a) Intensity of the factors effect to control type I error, is not in the hands of statistician or administration who influences data collection. Intensity of factors is in-built in the experiment. Only inventors can devise a new test by modifying less sensitive factors to other factors which are having pronounced effect or more sensitive, but then it will be another test altogether. However, one thing that can also be done to control type I error even for less sensitive factors, is to replicate the experiment a few numbers of times and take the composite decision based on their outcomes. We often do this type of replication for our decision making process, when we say let's take a second opinion.
Thus to control errors, there are statistical methods of modifying sample size, significance level, and replicating the experiment and there are non-statistical methods of administrative actions framing protocols that ensures objectivity, proper supervision and better training.
Type I and type II, which one is more important ?
While developing the theory of testing of hypothesi, it is assumed that null hypothesis is important one so it's incorrect rejection should be controlled. Thus type I error becomes more serious for the researcher. Hence, in normal set-up of testing, type I error is fixed at a certain value like 5% or 1%. Incorrect rejection of null hypothesis is controlled by fixing type I error value.
But in real world, any of the errors, type I or type II, could be serious. It depends on the research context. Null hypothesis are generally a normal set-up, an equality or independence of characteristics under observatios and a known probability distribution function. So the type I can be termed as incorrect rejection of normal set-up, while type II is incorrect acceptance of normal set-up.
Incorrect rejection of normal set-up will lead to unnecessary panic, search for more solutions, new policies, and practices to fulfill the inadequacies. Hence it will lead to unnecessary waste of resources. It will guide to punish the normal set-up unnecessarily.
Incorrect acceptance of normal set-up will lead to sudden burst of events, sudden losses and sudden disasters. This will build up frustrating response system, where opportunities will not be there to innovate new ideas. Practical consequences can be serious suddenly because of overlooking.
In some contexts, a type I error is preferable to a type II error, as wastage is preferable to sudden disaster. In other contexts, a type I error is dangerous to type II error, as punishing is more dangerous to overlooking.
When type II is important
When the thesis is that "no victim or no culprit should escape", consequences of accepting null hypothesis becomes serious or cause of concern. We are ready that a normal person may get classified as a victim or culprit, but a victim / culprit should not go scot free. Victim means person falling under alternative hypothesis, as alternative hypothesis is for abnormal set-up. Rejecting null incorrectly is tolerable for the researcher, but accepting null incorrectly is not tolerable. In all such cases, type II errors become important. Examples: (i) designing a medical screening for a disease, here a false positive is more desirable than a false negative. (ii) zero tolerance for persons entering into a security zone.
When type I is important
When the thesis is "no innocent should be punished", consequences of rejecting the null hypothesis becomes important. We are not ready to punish any innocent person, even if in the process, some culprit may go scot free. Innocent is a normal set-up, so will be falling under the null hypothesis. Rejecting incorrectly the null hypothesis is not tolerable in such cases, as incorrectly accepting the null hypothesis. In judicial system, for a trial of murder, acceptance of murderer as innocent is preferable than acceptance of innocent as murderer. Acceptance of murderer as innocent means rejecting the null hypothesis of he being innocent, when actually he is innocent. Thus type I error is preferable to type II error.
Consequences of results of test determine seriousness of type I and type II errors
For proper planning of the statistical testing procedure, consequences of both types of errors should be accounted for determining sample size, significance leval, sample size, and replication of the experiment as well as how to go about the non-statistical methods of administrative actions, framing protocols that ensures objectivity, proper supervision and better training of persons who will record the observations.