Different types of t-test
Joey: While you are explaining about hypothesis testing, you told me that there are different types of hypothesis tests and introduced me to the well-known Z-test. But I have heard that t-test is the one which is used more often in practice. How it is different from Z-test?
Chandler: The below four steps are common for all hypothesis tests
- Formulate the null and the alternative hypothesis: They are two mutually exclusive and collectively exhaustive statements that we make about the population parameter. An important caveat in formulating these statements is that the null hypothesis is a commonly accepted fact (or default value) and alternative hypothesis is a statement which people want to test on.
- Calculate the test statistic for the sample data: Test statistic compares the sample data with the expected value of the population parameter which was hypothesised and helps us to make a decision in hypothesis testing. CLT plays a major role in calculating the test statistic. The test statistic in a Z-test indicates the distance between the sample mean and the hypothesis mean (H0) in terms of standard deviation. A Z score of 0.25 indicates that the sample mean is 0.25 standard deviations away from the hypothesized mean. A very high or a very low (negative) Z scores indicate that sample mean is very much different from hypothesized mean (H0). In other words, the sample mean has comes from a different distribution other than the null distribution.
- Calculate the p-value: It is a way to quantify the Z-score in terms of the probability. Given the null hypothesis is true p-value represents the probability of seeing a sample as extreme as the one which we have. So higher its value, higher is the probability that our null hypothesis is true.
- Making a decision: In order to make a decision i.e. either accepting the null hypothesis or failing to reject the null hypothesis we fix some levels of significance which is usually represented by α and is predefined. Typical values for α are 0.1, 0.05, and 0.01 depending on the application. The decision is then made by comparing the p-value with α,
- p-value > α (Decision: Fail to reject the null hypothesis)
- p-value < α (Decision: Reject the null hypothesis)
The main difference between the Z-test and the t-test is the test statistic which is computed, in Z-test we compute the Z statistic which is given by
In a t-test we compute the t-statistic which is given by
n=number of observations or data points
σ=population standard deviation
s=sample standard deviation
The main difference between the Z-statistic and t-statistic is that, while the former requires population standard deviation to be known before performing the test (which is quite difficult to obtain in real life), the latter on the other hand requires just the sample standard deviation. But if we can somehow obtain the population standard deviation then Z-test should be our choice.
The next step in Z-test after calculating the z-statistic is to find the p-value from the standard normal distribution table. We follow the same analogy for t-test except that we use t-statistic to find the p-value and use also use the student t-distribution table. The table below compares the student t- distribution and the standard normal distribution.
Likewise, in the limiting distribution of a t-distribution is a normal distribution which means that as the sample size increases, the approximation of t-distribution by normal distribution will be more accurate.
If you remember Joey, few weeks back we solved the problem of checking if the background colour of our website has any impact on the number of chicks which it receives through a Z-test by simply assuming that σ or population standard deviation is known to us. We also knew that it is difficult to obtain σ value in a real life. Let us solve the same problem through t-test without making any assumption about σ.
The null and alternative hypotheses are
H0: Changing the colour of the website doesn’t influence the number of clicks which it receives.
H1: Changing the colour of the website influences the number of clicks which it receives.
If changing the colour of the website has an influence on the number of clicks, then the average number of clicks will change from the default value (i.e., 30 per day) upon changing in colour. Mathematically it is equivalent to saying
H0: μ = 30 per day
H1: μ≠ 30 per day
The sample mean is given by
The sample standard deviation is given by
The t-statistic is given by
The significance level is 0.05. The corresponding p-value from the t-table is 0.149 which is greater than the threshold (significance level) so we fail to reject the null hypothesis and conclude that we don’t have sufficient evidence to reject the fact that changing the colour of website doesn’t influence the number of clicks.
Joey: That’s great. But do we have different type of t-test like we had for Z-test?
Chandler: Yes, we do have!
Till now what we have seen is just one sample t-test which can be classified as following
It is called a one sample t-test because we compare the mean of a single sample to a fixed value.
Joey: So you mean to say that there should be something called a two sample t-test?
We follow the same steps as we do for one sample t-test. We can classify the two sample test in to three types based on type of null and alternative hypothesis.
Let ‘a’ be the t-score.
Similarly, t-statistics for two sample t-test is given by
Δ = The hypothesized difference between the population means (0 if testing for equal means)
x1=Sample mean for first sample
x2=Sample mean for second sample
n1=number of observations for first sample
n2=number of observations for second sample
s12 = sample variance for first sample
s22 = sample variance for second sample
Let me explain all the steps through an example which we discussed before.
Let’s assume that we are interested in studying the effect of gender in our customer’s purchasing behaviour. In particular, we want to answer the question whether on an average male and female purchase the same number of products from our website.
So in this case our Null and Alternate hypothesis are as following:
Let’s pull out 15 data points from our database and test this hypothesis.
Dataset (numbers indicates the number of products purchased):
So here we have,
Now the t value can be computed using
P value = 0.217
Now assuming a threshold of 0.05, we infer that we don’t have sufficient evidence to reject the null hypothesis i.e. we cannot reject the fact that the purchasing behaviour of male and female are same.
Joey: Chandler, you told me that we should do t-test if we don’t know the population standard deviation. But, I have seen it in many sources that they use following conditions to decide between Z-test and t-test.
Which one I should follow?
Chandler: They are following the above condition for two reasons.
- When the sample size increases, the sample standard deviation ‘s’ will be considered as a better estimate for a population standard deviation ‘σ’. In other words, ‘s’ will be more closer to ‘σ’ for a larger sample size.
- As the sample size increases, t-distribution moves closer towards normal distribution.
Due to these two reasons, p-value given by Z-test and t-test for a larger sample size is almost same. This approximation is better only for very large sample size. So, I recommend you to choose the type of test based on the information about the population standard deviation instead of the size of samples.
Joey: Got it buddy.
Joey: Chandler, one more thing, I have come across another type of t-test in the literature which is called as paired t-test. How it is different from t-tests which you have explained?
Chandler: Paired t-test converts two-sample t-test problem in to one sample t-test problem by taking difference between the paired samples. Paired samples (also called dependent samples) are samples in which natural or matched couplings occurs.
Let me give you an example for paired sample. Suppose a sample of 3 students were given a diagnostic test before teaching a particular module and then again after completing the module. We want to find out if in general our teaching has resulted in an improvement in student’s knowledge/skills.
The pre-module and post module scores are paired samples because we measure the scores for the same student before and after teaching the module. We usually apply one sample t-test for the differences of paired samples. For example: the null and alternative hypothesis for the above problem is (Let μ be the true mean of the differences of paired samples)
Joey: Can you give more examples of paired samples?
Chandler: Okay, Let me summarize some of the cases where we can treat two sample t-test problems as paired t-test problems.
- Pre-test / Post-test samples in which a factor is measured before and after an intervention. (The above example falls under this category)
- Cross-over trials in which individuals are randomized to two treatments and then the same individuals are crossed-over to the alternative treatment.
- Matched samples, in which individuals are matched on personal characteristics such as age and sex.
- Any circumstance in which each data point in one sample is uniquely matched to a data point in the second sample.
Let me walk through an example which clears all of your doubts about the paired-t test.
Let’s assume that we are interested in studying the relationship between number of products purchased and the number of products added to the cart. To be more specific, assume that till now we believe that the average number of products purchased is two less than the number of products added to the cart, but now due to our new discount schemes we hypothesis that it is not equal to true.
In this case we have to perform a two sample t-test to evaluate our hypothesis.
So in this case our Null and Alternate hypothesis are as following:
Null hypothesis: μ = 2
Alternative Hypothesis: μ ≠ 2
Let’s pull out records of 15 customers from our database to test our hypothesis
Here we have,
So the t value can be found out using the expression
P value = 0.823
Again, we find that since the p value is higher than our threshold of 0.05 we fail to reject the null hypothesis and so we cannot dismiss our old beliefs.
Joey: Since we can solve the above problem using a two sample t-test, is there any advantage for using paired t-test for paired sample?
Chandler: Before explaining the advantages of using paired t-test for paired samples, let me give you a short recap about the power of the test.
The power of hypothesis testing is nothing more than 1 minus the probability of Type II error. Basically the power of a test is the probability that we make the right decision when the null is not correct
Power = 1−β = P (Rejecting the null hypothesis| when it is false)
Power is the likelihood that a study will detect an effect when there is an effect in reality. If the power is high, then the probability of committing Type II error will be less. It also helps to calculate the minimum sample size required that would detect an effect reasonably. To get more intuition about the power of hypothesis testing, please revisit the Type I and II error blog.
In a paired t-test, each observation is of the form: Z=X−Y (Difference = Post module score — Pre module score). That means that the variance of each observation is
Var(Z) = Var(X-Y) = Var(X) + Var(Y) — 2Cov(X,Y)
If X and Y are positively related, the paired t-test will reduce the Var(Z) by −2Cov(X,Y) . Most often, paired samples have a positive covariance and thus Var(Z) will reduce. The Var(Z) in the paired test is denoted as sd2. The equation for the paired t-test is given by
If sd decreases, t-statistic value will increase that leads to decreases of p-value. So as a result our chances of rejecting the null hypothesis will increase thus yielding more power to our tests.
If the data is paired and if we don’t use the paired test, then we lose the power and are less likely to reject the null hypothesis. It is worth noting that in case if the data is not paired and if we still use the paired test, then also we lose power. For instance, if we are using a one-sided t-test on a sample size of 2n observations by erroneously concluding that observation 1 is paired with observation n+1, observation 2 is paired with observation n+2, etc… Then in a paired t-test our sample size will be reduced by half resulting in reducing the power of the test.
The author of this blog is Balaji P who is pursuing PhD in reinforcement learning at IIT Madras