Chi-Squared Testing

Solomon Xie
Statistical Guess
Published in
3 min readJan 11, 2019

--

Chi is written as greek letter 𝐗, looks like x, reads "kai".
Chi-squared is written as 𝐗².

Like Z-score in a Normal Distribution for a _Z-test_,
T-score in a T-Distribution for a _T-test_,
𝐗² is the “Test statistic” in _Chi-square Test_, which converts sample data to a standardized value in a _Chi-square Distribution_.

Chi-square Test is a hypothesis test for categorical data.

Refer to wiki: Chi-squared test
Refer to Khan academy: Chi-square statistic for hypothesis testing
Refer to Crash course: Chi-Square Tests: Crash Course Statistics #29

Conditions for a goodness-of-fit test

  • Random condition: The data came from a random sample from the population of interest, or a randomized experiment.
  • Independent condition: If we sample without replacement, our sample size should be less than 10% of the population so we can assume independence between members in the sample.
  • Large counts condition

Each Expected count need to be at least 5.
(No conditions attached to the _observed counts_)

Example

Solve:

  • The large counts condition says that all expected counts need to be at least 5
  • Patrick needs to sample enough visits so that he expects each day of the week to appear at least 5 times. There are 7 days in the week, so he needs to sample at least 5*7=35 visits.

Chi-squared Test statistic formula

How to understand the formula?
It’s not hard to see this a way to standardized the data:

  • Observed - Expected gets the _Distance_,
  • ()² eliminates the negative results,
  • ÷ Expected unweights the data, so that the value will fit to the Standardized Distribution, similar to the concepts of Unit Circle or _Unit Vector_.

Chi-squared Distribution

P-value for Chi-squared Test

Type 1: Chi-square Goodness-of-Fit Test

Tests how well certain proportions fit our sample, which only has ONE variable(row).

Type 2: Tests of Independence

Look to see whether being a member of ONE CATEGORY is independent of THE OTHER, which has TWO variables (rows).

Type 3: Tests of Homogeneity

It’s looking at whether it’s likely that Different samples come from the Same population.

--

--

Solomon Xie
Statistical Guess

Jesus follower, Yankees fan, Casual Geek, Otaku, NFS Racer.