Can we see the forest for the trees? (continued)

Hansol Rheem
Human Systems Data
Published in
4 min readApr 3, 2017

In my last blog post, I compared the simple regression model with the multiple regression model. I also threw a question asking if we could see the forest (multiple regression) for the trees (simple regression). I concluded that we are sometimes better off with multiple “SIMPLE” regression models than with a single “MULTIPLE” regression model. However, can we say the same thing about the t-test and the f-test (ANOVA)?

The t-test is an analysis technique used to compare the means of two samples or two populations. In contrast, the f-test is a technique used to compare the variances of more than two samples or populations. They are both used to tell if any aspect (that we are interested) of different groups differ significantly. Usually, a clear distinction is made between the t-test and the f-test in most of the statistic courses. That is, the t-test is used when comparing between two samples whereas the f-test is used when comparing more than two samples. Technically, however, we can use the (1) t-test to compare more than two samples and use the (2) f-test to compare between two samples. In today’s post, I will talk about the first case, since the second case is less problematic than the first case, frequently practiced, and beyond today’s topic.

Let’s assume that we want to compare effects of three different drugs. How can we compare between these drugs using only the t-test? To achieve this goal, one can perform three t-tests: (1) t-tests comparing drug1 and drug2, (2) drug2 and drug3, and (3) drug3 and drug1. In this way, we will be able to find out which drug has the largest effect, and which drug has the smallest effect. This is quite a solid plan until we find out that we have increased the type 1 error in the process of performing the three t-tests. Then, what is the type 1 error, and how is it a problem? The t-test is a hypothesis testing technique which investigates if sample means differ by testing the null hypothesis (the hypothesis that assumes no difference). Just like humans, the t-test makes mistakes too. It sometimes indicates that there exists a significant difference between the two sample means even if there isn’t. In other words, the t-test can reject the null hypothesis even though it is true, and the chance of this error increases as we run more t-tests. For example, we are more likely to judge a random person as a thief if we are looking for someone who stole our cookies, don’t we? The chance of the type 1 error in a single t-test is normally 5 %. This chance increases by about 5% each time you perform one additional test. Therefore, the type 1 error would increase up to about 15% in our case, which is not an acceptable rate of error.

But, do not fear as we have the f-tests which can make this problem go way. Usually, f-tests take two steps before determining which sample has the larger mean. First, it examines if there are significant differences between the sample means by testing the null hypothesis. In this step, we only get to know if the samples differ or not. If the result from the first stage indicates that there is a significant difference, the f-test advances to the second stage, and use post-hoc tests to find out where the difference is coming from. The post-hoc tests are additional tests equivalent to the three t-tests in our previous example. The difference is that the post-hoc tests correct the increase of the type1 error while the t-tests do not. For a better understanding of the f-test procedures, let’s go back to our example of drugs. So in the first stage, the null hypothesis is tested to find out if the effects of the three drugs differ or not. Even if the result says that there is a difference, we still have no idea which drugs have different effects, or which drug has the largest effect. Therefore in the subsequent stage, additional tests are performed to find answers to these questions. In this stage, the post-hoc test takes into account the fact that it is more likely to conclude there are differences between effects of the drugs, if the test is actively looking for differences. Accordingly, the post-hoc test comes up with a more rigorous standard to prevent the mistake of concluding that there is a difference even if there is not. In this way, the f-test is able to make more accurate and careful conclusions than the multiple t-tests. Usually, you will know the post-hoc tests by the name of the Tukey test, or the Bonferroni test. They are the two most frequently used post-hoc tests. However, I use the Scheffe test which is known to apply the most rigorous standard.

In my previous post, my answer to the “forest” question was not clear enough, because sometimes examining the trees (the simple regression) helps more, and sometimes examining the forest (the multiple regression) helps more in the domain of regression models. However, the answer is pretty much clear in the domain of testing differences between samples. We have to examine the forest (the f-test) in order not to obtain a biased view of the data at hand.

References

https://brandalyzer.blog/2010/12/05/difference-between-z-test-f-test-and-t-test/

https://statistics.laerd.com/statistical-guides/one-way-anova-statistical-guide-2.php

http://keydifferences.com/difference-between-t-test-and-f-test.html

--

--