Hypothesis Test

Eugine Kang

4 min readSep 15, 2017

Table of Content

One-Sample T-Test
Two-Sample T-Test
Analysis of Variance (ANOVA)
Chi-Square Test for Goodness of Fit
Chi-Square Test for Independence

1. One-Sample T-Test

Tests weather the mean of a normally distributed population is different from a specified value.

hypothesis

Null Hypothesis is a population mean is equal to a given value. One-sample t-test could be separated by one-tail or two-tail test depending on your Alternative Hypothesis.

For one-tail test, t-statistic is positive for the above case and negative for the below case.

two-tail

For two-tail test, double the t-statistic to consider both end of the spectrum

Look up p-value through the t-statistic table. When p-value is less than the predetermined value of significance, reject null hypothesis and accept alternative hypothesis.

Python stats.ttest_1samp is for two-tailed test. Make sure to half the p-value and check the sign of t-statistic for one-tailed test.

Make sure to half the p-value and check for the sign of the t-statistics when doing one-tailed test.

2. Two-Sample T-Test

Tests weather the means of two populations are significantly different from one another

Paired

Each value of one group corresponds directly to a value in the other group, before and after values in an experiment. Subtract two values and perform a one-sample t-test with null mean set to 0.

Wilcoxon signed-rank test can also be used when the population cannot be assumed normal.

Unpaired

Null hypothesis accepts the mean of two populations are equal. Alternative hypothesis can be one-tailed or two-tailed for means being unequal. The default case is to assume equal variance between two groups. We can also assume unequal variance between two groups.

We assume Gaussian distribution, but we can not assume this and perform a Mann-Whiteney U Test for non-Gaussian unpaired 2 sample t-test. stats.mannwhiteneyu()

One-tailed test can also be performed by making sure the order of sample. First mean is greater than second mean and etc.

3. Analysis of Variance (ANOVA)

Tests weather or not the means of several groups are equal, and therefore generalizes the t-test to more than two groups.

The ANOVA test has important assumptions that must be satisfied in order for the associated p-value to be valid.

The samples are independent.
Each sample is from a normally distributed population.
The population standard deviations of the groups are all equal. This property is known as homoscedasticity.

4. Chi-Square Test for Goodness of Fit

Checks weather or not an observed pattern of data fits some given distribution

We want to know if the pattern from our data follows a given distribution.

The main data using for chi-square test is the frequency count (crosstab) for categories. The default setting is to expect equal distribution among different categories. Degree of freedom (dof) is number of categories minus 1.

A typical rule is that all observed and expected frequency of categories should have at least a count of 5.

5. Chi-Square Test for Independence

Checks weather two categorical variables are related or not (independence)

The key to the test is computing chi-square statistic and p-value for the hypothesis test of independence of the observed frequencies in the contingency table. The expected frequencies are computed based on the marginal sums under the assumption of independence.

Use pandas.crosstab to get the frequency count from different categorical features. This will be your data fed into your test