Implementing Student’s t-Test in Python from Scratch

Hasan Khan
3 min readJul 19, 2024

--

The Student’s t-Test is a fundamental statistical hypothesis test that determines whether two samples come from the same population. This test is pivotal for data analysis and machine learning, and understanding its implementation deepens your statistical knowledge.

In this article, you’ll learn how to code the Student’s t-test from scratch in Python, covering both independent and dependent samples.

Overview

  1. Student’s t-Test: Basic concepts.
  2. Independent Samples t-Test: Comparing means of two unrelated samples.
  3. Dependent Samples t-Test: Comparing means of two related samples.

Student’s t-Test

The Student’s t-Test checks if two samples likely come from the same population by comparing their means. The t statistic is compared against critical values from the t-distribution to determine significance.

Independent Samples t-Test

Calculation

The t-statistic for two independent samples is calculated as:

Where SED (Standard Error of the Difference) is:

Implementation in Python:

from math import sqrt
from numpy import mean, std
from scipy.stats import sem, t

def independent_ttest(data1, data2, alpha=0.05):
mean1, mean2 = mean(data1), mean(data2)
se1, se2 = sem(data1), sem(data2)
sed = sqrt(se1**2.0 + se2**2.0)
t_stat = (mean1 - mean2) / sed
df = len(data1) + len(data2) - 2
cv = t.ppf(1.0 - alpha, df)
p = (1.0 - t.cdf(abs(t_stat), df)) * 2.0
return t_stat, df, cv, p

# Example
from numpy.random import seed, randn

seed(1)
data1 = 5 * randn(100) + 50
data2 = 5 * randn(100) + 51

t_stat, df, cv, p = independent_ttest(data1, data2)
print(f't={t_stat:.3f}, df={df}, cv={cv:.3f}, p={p:.3f}')

if abs(t_stat) <= cv:
print('Accept null hypothesis that the means are equal.')
else:
print('Reject the null hypothesis that the means are equal.')

if p > 0.05:
print('Accept null hypothesis that the means are equal.')
else:
print('Reject the null hypothesis that the means are equal.')
Output

Dependent Samples t-Test

Calculation

The t-statistic for paired samples is:

Where SED is:

The standard deviation of the differences (diff) is calculated using the differences between each pair of observations.

Implementation in Python:

from math import sqrt
from numpy import mean, std
from scipy.stats import sem, t

def dependent_ttest(data1, data2, alpha=0.05):
mean1, mean2 = mean(data1), mean(data2)
n = len(data1)
d1 = sum([(data1[i] - data2[i])**2 for i in range(n)])
d2 = sum([data1[i] - data2[i] for i in range(n)])
sd = sqrt((d1 - (d2**2 / n)) / (n - 1))
sed = sd / sqrt(n)
t_stat = (mean1 - mean2) / sed
df = n - 1
cv = t.ppf(1.0 - alpha, df)
p = (1.0 - t.cdf(abs(t_stat), df)) * 2.0
return t_stat, df, cv, p

# Example
seed(1)
data1 = 5 * randn(100) + 50
data2 = 5 * randn(100) + 51

t_stat, df, cv, p = dependent_ttest(data1, data2)
print(f't={t_stat:.3f}, df={df}, cv={cv:.3f}, p={p:.3f}')

if abs(t_stat) <= cv:
print('Accept null hypothesis that the means are equal.')
else:
print('Reject the null hypothesis that the means are equal.')

if p > 0.05:
print('Accept null hypothesis that the means are equal.')
else:
print('Reject the null hypothesis that the means are equal.')
Output

Conclusion

Implementing the Student’s t-test from scratch in Python enhances your understanding of this critical statistical tool. Use these implementations to deepen your knowledge and apply them to your data analysis projects.

--

--

Hasan Khan

In the sea of numbers, every data point has a story to tell 📈