Hypothesis Testing

Boda Ye
3 min readMar 3, 2019

--

Hypothesis testing a common tool in Data scientists’ tool kit. Today, I’m going to go through the detail of famous stats theorem.

First, why do need to learn it?

Suppose your marketing colleague did an A/B test on advertisement, and you would like to decide whether there’s any difference between 2 versions, i.e whether there’s significant difference between the mean of 2 versions’ customer CTR. How to prove it? Another situation would be detecting whether the relationship between features and response are significant. The answer for both questions are hypothesis testing.

Second, let’s talk about basic terminologies related hypothesis testing.Hypothesis is a specific claim about the population, often described using an expression of parameters. Null hypothesis is the claim reflecting default situation, usually is no relation or no difference. Alternative hypothesis is the claim about the population when the hypothesis is false. P-Value is the probability of observing as or more extreme results than the current observation, under null hypothesis. Significance level alpha is the probability of rejecting H0, when H0 is true, i.e. falsely reject H0. Type I Error is falsely reject H0. Type II Error is falsely accept H1. Power is the probability of rejecting H0, when H1 is true.

According to the definitions above, we can have some simple inferences. If significance level increases, type I error would increase and type II error would decrease. In practice, type I error is with a larger risk than type II error, so we tend to fix a small significance level.

There are several important assumptions need to check while utilizing hypothesis testing. Binary or continuous?: here continuous means numeric and non-categorical, even discrete data can be considered as approximately continuous. One group or two? Assume normal or not? Two sides or one Sides?

This the algorithm of hypothesis testing:

Fisher’s Exact Test

Suppose we have done a A/B test and get a result like above table. The possible more extreme possibilities are:

Permutation Test

H0 of the test is two distributions are exchangeable, to be specific, for binomial data, H0 is p1=p2

The steps are straightforward:

Suppose we have 2 sets A(size 10), B(size 20), and we want to prove their distributions are different.

First, calculate the difference of means of A,B as observed difference

Second, combine A,B together, and randomly choose data from combined set to get A’(size 10) and B’(size 20). Calculate their difference of means.

Third, redo second thousand times. (the more you redo it, the better result would be)

Finally, p-value is the percentage of cases which greater than observed difference

Bootstrap Test

Suppose we have x1={1,2,3} x2={0,4,8}

First, calculate x1.mean, x2.mean and x.mean( total mean of x1 and x2)

Second, recenter x1 and x2(x1={2,3,4} x2={-1,3,7})

Third, bootstrap x1 and x2, calculate z score

Forth, redo step three thousand times

Finally, p-value is the portion which bootstrap’s z score> original z score

--

--