Hypothesis testing: Paired and unpaired two-sample t-test and Z-test in R, Python, and Google Spreadsheet

Mochamad Kautzar Ichramsyah
CodeX
Published in
9 min readJul 29, 2023
Photo by Andrew Palmer on Unsplash

As promised, continuing my previous post about How to do one-sample t-test and Z-test as part of hypothesis testing in R, Python, and Google Spreadsheet, in this post, I will share how to do hypothesis testing if we want to compare two samples, paired and unpaired. Before that, we need to know about the difference between those two.

Two-sample hypothesis testing: Paired and Unpaired

A two-sample hypothesis testing is used to test whether a two-sample has a significantly different sample or population mean. This means we can have two datasets in our hypothesis, which could be from one source (dependent) or different sources (independent).

For a better explanation of the difference, I will use the example of Kautzar who works as a Data Analyst at a retail company.

Figure 1. Available hypotheses
Figure 2. How to calculate two-sample paired t-stat
Figure 2. How to calculate two-sample Z-stat

Paired two-sample

In this case, we want to know if is there any difference in the average amount of money spent in a month before and after a group of users got special treatment from the company such as we sent them a “thank you package”. Of course, the company sent the package hoping that it will make the users feel appreciated, so they want to spend more on the company’s products. Due to the group of users being only 1 group, but having two conditions which are before and after given treatment.

That is why it is called paired, which means the comparison is dependent on each other.

Unpaired two-sample

In this case, we want to know if is there any difference in the average amount of money spent in a month between groups of users who got special treatment from the company, the first group got a “thank you package” with content X and the second group got a “thank you package” with content Y. Similar with the previous case, the company sent the package hoping that it will make the users feel appreciated, so they want to spend more on the company’s products. The difference is we have two groups of users now and each other got different treatment.

That is why it is called unpaired, which means the comparison is independent of each other.

Dummy datasets for example

I generated dummy datasets for this article, you can download the dataset here:

  1. Unpaired t-test
  2. Unpaired Z-test
  3. Paired t-test
  4. Paired Z-test

The paired two-sample t-test

Using the dummy dataset we generated, Paired t-test, we want to prove the hypothesis, “The before-treatment users spent less money compared to after-treatment users”, at the α = 0.05 level of significance.

First of all, we can pick one of the hypothesis alternatives to be chosen as our main statement to prove:

Figure 4. The hypothesis for this example

Using R

# read the file into "df"
df <- read.csv("paired_t.csv")

# look at the first 6 rows
head(df)

# get the summary of df
summary(df)

# calculate the standard deviation
paste("Std dev for before treatment group: ", sd(df$before))
paste("Std dev for after treatment group: ", sd(df$after))

# paired two-sample t-test using R
t.test(df$before, df$after, paired = TRUE, alternative = "less", conf.level = 0.95)
Figure 5. Paired t-test result in R

As we can see in the result above:

  1. t-stat = -1.6215, df = 24, and p-value = 0.05899.
  2. Decision rule: If the p-value is less than α = 0.05, then reject the null hypothesis (H0). Otherwise, do not reject the null hypothesis.
  3. Decision fact: Because the p-value: 0.05899 is higher than α = 0.05, then we do not reject the null hypothesis (H0), which means we do not have enough evidence to reject the mean difference between the before-treatment group’s and after-treatment group’s is equal to 0 in 95% confidence interval.
  4. To find out the estimate of the mean difference (m) between the before-treatment and after-treatment, we can look into the 95% confidence interval -Inf to 0.5316604. From that interval, 0 is included. Let’s redo our hypothesis testing using a number higher than 0.5316604, let’s say we will use 1.

Let’s try to redo the hypothesis testing using a new hypothesis.

Figure 6. The new hypothesis for this example, using 1which is a higher difference than 0.5316604

Write this code in R:

# read the file into "df"
df <- read.csv("paired_t.csv")

# look at the first 6 rows
head(df)

# get the summary of df
summary(df)

# calculate the standard deviation
paste("Std dev for before treatment group: ", sd(df$before))
paste("Std dev for after treatment group: ", sd(df$after))

# paired two-sample t-test using R
t.test(df$before, df$after, mu = 1, paired = TRUE, alternative = "less", conf.level = 0.95)
Figure 7. Paired t-test result in R using new hypothesis

As we can see in the result above using a new hypothesis:

  1. t-stat = -1.7897, df = 24, and p-value = 0.04307.
  2. Decision rule: If the p-value is less than α = 0.05, then reject the null hypothesis (H0). Otherwise, do not reject the null hypothesis.
  3. Decision fact: Because the p-value: 0.04307 is less than α = 0.05, then we do reject the null hypothesis (H0), which means we do have enough evidence the mean difference between the before-treatment group and after-treatment group is less than 1 in 95% confidence interval.
  4. If you realize, when we set the value of 1 to the hypothesis, the p-value produced is very near to α = 0.05, because the value we set is close to the upper of the 95% confidence interval we do previously.
  5. Please try on your own if you set a much smaller value of the hypothesis, let’s say 10, you will get a smaller p-value, I guarantee. 😃

Using Python

# import pandas 
import pandas as pd

# read the data into df
df = pd.read_csv("paired_t.csv")

# look at the first 5 rows of df
df.head()

# get the info of df to know the mean and std
df.describe()

# import scipy.stats as ss
import scipy.stats as ss

# paired two-sample t-test
ss.ttest_rel(df['before'], df['after'], alternative = "less")
Figure 8. Paired t-test result in Python

For documentation of ttest_rel you can try to read this https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_rel.html

Using Google Spreadsheet

Figure 9. Paired t-test result in Google Spreadsheet

For documentation of T.TEST you can try to read this https://support.google.com/docs/answer/6055837?hl=en

Conclusion: The paired t-test

  1. It’s possible to do paired t-tests using tools such as R, Python, and Google Spreadsheet, the t-statistic and p-value result is equal for any tools.
  2. In my humble opinion, I highly recommend using R to do a t-test, due to its output completeness and flexibility to modify the inputs.

The unpaired two-sample Z-test

Using the dummy dataset we generated, Unpaired Z-test, we want to prove the hypothesis, “The package-X-treatment users spent money differently compared to the package-Y-treatment users”, at the α = 0.05 level of significance. In this example, the population standar deviation is known equal to 36.

First of all, we can pick one of the hypothesis alternatives to be chosen as our main statement to prove:

Figure 10. The hypothesis for this example

Using R

# read the file into "df"
df <- read.csv("unpaired_z.csv")

# look at the first 6 rows
head(df)

# get the summary of df
summary(df)

# calculate the standard deviation
paste("Std dev for package_x treatment group: ", sd(df$package_x))
paste("Std dev for package_y treatment group: ", sd(df$package_y))

# load package for Z test
library(BSDA)

# unpaired Z test in R
z.test(x = df$package_x, y = df$package_y, alternative = "two.sided", mu = 0,
sigma.x = 34, sigma.y = 34, conf.level = 0.95)
Figure 10. Unpaired z-test resultin R

As we can see in the result above:

  1. t-stat = -0.23627, and p-value = 0.8132.
  2. Decision rule: If the p-value is less than or equal to α = 0.05, then reject the null hypothesis (H0). Otherwise, do not reject the null hypothesis.
  3. Decision fact: Because the p-value: 0.8132 is higher than α = 0.05, then we do not reject the null hypothesis (H0), which means we do not have enough evidence to say the before-treatment group’s value is different than the after-treatment group’s value in 95% confidence interval.
  4. If we see on 95 percent confidence interval, it’s written between -13.63 to 10.69, which means 0 is included, that’s why the hypothesis testing does not reject the H0.
  5. Also, the sample means are listed, the mean of x is 154.16 and the mean of y is 155.63. If we are not using hypothesis testing, we may be saying there is a different right between package_x and package_y treatment users.

Using Python

# import pandas 
import pandas as pd

# read the data into df
df = pd.read_csv("unpaired_z.csv")

# look at the first 5 rows of df
df.head()

# get the info of df to know the mean and std
df.describe()

# import scipy.stats as ss
from statsmodels.stats.weightstats import ztest as ztest

# paired two-sample t-test
ztest(df["package_x"], df["package_y"], value = 0)
Figure 11. Unpaired Z-test result in Python

For alternative way, please use CompareMeans from the same package statsmodels.stats.weightstats . You can read the documentation here https://www.statsmodels.org/dev/stats.html#basic-statistics-and-t-tests-with-frequency-weights.

Using Google Spreadsheet

Figure 12. Paired t-test result in Google Spreadsheet

As you can see, in this example, I am still using T.TEST because Z.TEST in Google Sheets not yet providing for a two-sample case, that’s why I forced myself to use T.TEST using two.tailed and equal variance assumption.

For documentation of T.TEST you can try to read this https://support.google.com/docs/answer/6055837?hl=en.

Conclusion: The unpaired Z-test

  1. It’s possible to do unpaired Z-tests using tools such as R and Python, but not in Google Spreadsheet, the t-statistic and p-value result is equal or similar for any tools.
  2. In my humble opinion, I highly recommend using R to do a t-test, due to its output completeness and flexibility to modify the inputs.

What did we NOT learn so far?

  1. How to do an unpaired t-test in R, Python, and Google Sheets.
  2. How to do a paired Z-test in R, Python, and Google Sheets.
  3. How to fulfill any assumptions needed in paired or unpaired hypothesis testing.

For points 1 and 2, I do not share in this post, because I want to encourage you to try it by yourself, can you do it using a simple explanation on this post? Please challenge yourself!

For point 3, I can explain it here, but I am afraid I will miss a lot of important things due to keeping this post as simple as possible.

What did we learn so far?

  1. How to do a paired t-test in R, Python, and Google Sheets.
  2. How to do an unpaired Z-test in R and Python.
  3. Please check the documentation listed above, there are a lot of hidden gems that can make us understand more about hypothesis testing.
  4. Due to the hypothesis testing nature works closely with Statistics, I recommend using R, compared to the other tools, like Python or Google Spreadsheets.

Conclusion

In my current job as a data analytics professional, hypothesis testing is very useful when we want to know how effective the changes or treatments that we did to customers, is it have significant effect or not? Usually, in A/B test experiments, we do this hypothesis test A LOT.

Even when it’s not an experiment, hypothesis testing will tell us better decision-making rather than only calculating the difference. It’s called inferential statistics for some reason, so it can help us decide with scientific proof.

Thank you for reading!

Photo by Sincerely Media on Unsplash

Hi, thanks for coming to my Medium, I have finished the hypothesis testing series, yay! I hope you can learn a lot from what I share.

I am learning to write, mistakes are unavoidable, even when I try my best. If you find any problems/mistakes, please let me know!

--

--

Mochamad Kautzar Ichramsyah
CodeX
Writer for

Data analytics professional with 10 years of experience at tech companies in Indonesia.