Chi-Square

A chi-square X^2 statistic is used to investigate whether distributions of categorical variables (yes/no, male/female) differ from one another. The chi-square statistic compares the tallies or counts of categorical responses between two (or more) independent groups. It can only be used on actual numbers and not on percentages, proportions, means, etc.

If we repeat an experiment an infinite number of times, we obtain a sampling distribution of the chi-square statistic. This graph would show the distribution of chi-square values computed from the experiment’s sample size.

Two common tests using the chi-square distribution are tests of deviations of differences between theoretically expected and observed frequencies, and the relationship between categorical variables.


Further:

The chi-square statistic is the most common measure of association in a contingency table. Chi-squared measures the deviation of the observed counts from what they would be in a contingency table with the same marginal totals and no association. The chi-squared statistic takes the difference between the observed counts and expected counts in each cell and performs the following operations:

  1. squares the differences
  2. divides the squared differences by the expected counts
  3. sums the ratios

Interpreting the chi-square value:

The chi-square is tricky to interpret because it depends on the number of cases summarized in the table as well as the number of rows and columns in the table. Cramer’s V statistic can be used to evaluate the degree of association (0 — no association, 1 — perfect association).