ML: Student’s, Two-Sample & Paired Sample T-tests. Don’t Use It Blindly.

Jeheonpark
The Startup
Published in
5 min readSep 11, 2020

--

In Z-test, we assume we know the standard deviation of the population. What if we don’t know the standard deviation of the population? In this case, we assume the standard deviation of the sample distribution and keep going with Z-test. What if we don’t know the mean of the population? We can similarly somehow assume the mean and go with Z-test. When do we use a t-test? We use a t-test when the sample size is small. How small is small? We are using CLT(Central Limit Theorem) and it works well when the sample size is large enough. Since the sampling distribution should be Gaussian Distribution. If the sample size is too small, then this assumption starts to break apart, it does not follow Gaussian Distribution. It follows the heavy tail distribution, t-distribution.

Notes: The assumption is that the standard deviation of samples and populations is the same. We are trying to find out the difference in the mean of samples and populations. However, ANOVA doesn’t assume the standard deviation is the same. I will cover ANOVA also in the later posts.

t-test

Question: I have the mean of population and n samples,n is small. Can I reject the null hypothesis?

t-distribution

The inventor of t-distribution was working at the brewery in England, he was in charge of the…

--

--

Jeheonpark
The Startup

Jeheon Park, Software Engineer at Kakao in South Korea