Exploring Inferential Statistics with Python: Chi-Squared and ANOVA Tests.

Get an intuition of Chi-Squared and ANOVA Tests.

6 min readFeb 4, 2022

Chi-Squared Test Of Independence

The chi-squared test of independence allows you to test whether there is a relationship between two categorical variables. But, it does not tell you the direction or the size of the relationship. The test is based on the comparison between the observed frequencies in a contingency table and the frequencies that would be expected if the variables were independent.

The assumptions of the Chi-Squared include

The data in the cells should be frequencies, or counts of cases rather than percentages or some other transformation of the data.
The levels (or categories) of the variables are mutually exclusive.
Your two variables should be measured at an ordinal or nominal level (i.e., categorical data).

A chi-square test of independence is used to determine whether or not there is a significant association between two categorical variables.

H0: The two variables are independent.
H1: The two variables are not independent.

Consider the following data table.

We want to determine if there is any association between Gender and Color preference. The level of significance is 0.05.

data = [[100, 150, 20],
        [20, 30, 180]]H0 : Gender and favorite color are not related.

Import the necessary libraries and perform the test.

import scipy.stats as stats
from scipy.stats import chi2

#perform the Chi-Square Test of Independence
stats.chi2_contingency(data, correction=True)

The output of the code would be

(259.79602791196993,
 3.8548663789964316e-57,
 2,
 array([[ 64.8,  97.2, 108. ],
        [ 55.2,  82.8,  92. ]]))

The way to interpret the output is as follows:

chi-square test statistic:259.79
p-value:3.8548663789964316e-57
degrees-of-freedom: 2 (calculated as #rows-1 * #columns-1)
Array: The last array displays the expected values for each cell in the contingency table.

Since the p-value (3.854e-57) of the test is less than 0.05, we reject the null hypothesis. This means that we have sufficient evidence to say that there is a relationship between gender and favorite color.

Chi-Squared Goodness Of Fit Test

Chi-square goodness of fit test is used to determine whether or not a categorical variable follows a hypothesized distribution.

H0: A variable follows a hypothesized distribution.
H1: A variable does not follow a hypothesized distribution.

The following table gives the number of aircraft accidents that occurs during the various days of the week. Find whether the accidents are uniformly distributed over the week. The level of significance is 0.05.

week      = ['Sun','Mon','Tue','Wed','Thurs','Fri','Sat']
accidents = [14, 16, 8, 12, 11, 9, 14]H0: The accidents are uniformly distributed over the week.
H1: The accidents are not uniformly distributed over the week

Calculate the expected value

observed = [14, 16, 8, 12, 11, 9, 14]
expected_value = 0
for i in observed:
    expected_value = i + expected_value
expected_value = expected_value/len(observed)

expected_data = [int(expected_value) for i in range(len(observed))]
print(expected_data)

# [12, 12, 12, 12, 12, 12, 12]

We have the following

observed = [14, 16, 8, 12, 11, 9, 14]
expected = [12, 12, 12, 12, 12, 12, 12]

Now, to perform Chi-Squared Goodness of Test fit.

statistic, pvalue = stats.chisquare(f_obs=observed, 
                                        f_exp= expected)
statistic, pvalue

# (4.166666666666667, 0.6541333169963821)

The Chi-Square test statistic is found to be 4.166 and the corresponding p-value is 0.654.

Since the p-value (0.654) is not less than 0.05, we fail to reject the null hypothesis. That is the air accidents are uniformly distributed over the week.

We can go a bit further and calculate the critical value.

critical_value = chi2.ppf(q=1-alpha, df=len(observed)-1)
print('critical_value:',critical_value)

# critical_value: 12.591587243743977

We can set up a condition to accept or reject the Null Hypothesis.

if statistic < critical_value:
    print('We accept the null hypothesis')
else:
    print('We reject null hypothesis')

Anova-Test

An ANOVA is a statistical test that is used to determine whether or not there is a statistically significant difference between the means of three or more independent groups.

A one-way ANOVA (“analysis of variance”) compares the means of three or more independent groups to determine if there is a statistically significant difference between the corresponding population means.

One-Way ANOVA: Assumptions

Normality — Each sample was drawn from a normally distributed population.
Equal Variances — The variances of the populations that the samples come from are equal.
Independence — The observations in each group are independent of each other and the observations within groups were obtained by a random sample.

H0 : μ1 = μ2 = μ3 = … = μk (all the population means are equal)
H1 : at least one population mean is different from the rest

Let’s say you want to find out if the beverage that people drink affects their reaction time. So, you set up an experiment with three groups of people. The 1st group gets water to drink, the second gets juice and the third gets coffee. The level of significance is 0.05.

group_water  = [29, 30, 31, 31, 29]
group_juice  = [28, 29, 27, 30, 29]
group_coffee = [25, 28, 29, 27, 29]

(If you only had 2 groups, you could have used t-test).

Given that you have 3 groups, you ought to use analysis of variance.

The variation of scores is made up of two parts.

The variation within each group
The variation between the groups.

F = (between groups variance)/(within groups variance)

Import the necessary library and calculate the variation within each group.

import numpy as np
from scipy.stats import f_oneway

print(np.std(group_water)) # 0.8944271909999159
print(np.std(group_juice)) # 1.019803902718557
print(np.std(group_coffee)) # 1.4966629547095764

Now, to perform the ANOVA Test

f_oneway(group_water, group_juice, group_coffee)
# F_onewayResult(statistic=4.2745098039215685,
                                  pvalue=0.03965891577699055)

Since this p-value is less than 0.05, we accept the alternative hypothesis.
That is the drink does make a difference in reaction time.

Two-Way ANOVA

You should use a two-way ANOVA when you’d like to know how two factors affect a response variable and whether or not there is an interaction effect between the two factors on the response variable.

For the results of a two-way ANOVA to be valid, the following assumptions should be met:

Normality — The response variable is approximately normally distributed for each group.
Equal Variances — The variances for each group should be roughly equal.
Independence — The observations in each group are independent of each other and the observations within groups were obtained by a random sample.

A researcher wants to know whether test scores are influenced by Gender or Age or both. She collects information with regards to this experiment. The data she collected is as follows. The level of significance is 0.05.

I replicated the data using python.

import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
data = pd.DataFrame({'Gender': np.tile(np.repeat(['Boys', 'Girls'], 3),3),
                   'Age': np.repeat(['10', '11', '12'], 6),
                   'Score': [4,6,8,4,8,9,
                             6,6,9,7,10,13,
                             8,9,13,12,14,16
                   
                              ]
                  })
data

model = ols('Score ~ C(Age) + C(Gender) + C(Gender) * C(Age)', 
                                             data = data).fit()               
sm.stats.anova_lm(model, typ=2)

Since the p-value of gender (0.035) is less than .05, this means that gender has significant impact on the test scores.

Since the p-value of age (0.006) is less than .05, this means that age has significant impact on the test scores.

Since the p-value of age and gender (0.556) is greater than .05, we can say that gender and age (combined) interaction has no significant impact on the test scores.