Day 70 of 100DaysofML

Charan Soneji
100DaysofMLcode
Published in
4 min readSep 14, 2020

Chi Square test. The Chi-Square Test of Independence determines whether there is an association between categorical variables (i.e., whether the variables are independent or related). It is a nonparametric test. This test is also known as: Chi-Square Test of Association.

Let us try and understand the problem with an example.

Since the whole point of the test is to check whether there is a relation between two variables or not. Let us try and understand the most basic element of the test which is the null and alternate hypothesis.

A null hypothesis is a hypothesis that says there is no statistical significance between the two variables in the hypothesis. It is the hypothesis that the researcher is trying to disprove. Let us take the example of Ram’s hypothesizes, that the flowers he waters with club soda will grow faster than flowers he waters with plain water. He waters each plant daily for a month (experiment) and proves his hypothesis true! Ram’s null hypothesis would be something like this: There is no statistically significant relationship between the type of water I feed the flowers and growth of the flowers. A researcher is challenged by the null hypothesis and usually wants to disprove it.
An alternative hypothesis simply is the inverse, or opposite, of the null hypothesis. So, if we continue with the above example, the alternative hypothesis would be that there IS indeed a statistically-significant relationship between what type of water the flower plant is fed and growth.

Once the hypothesis is stated, we need to move to the next step which is to understand the term of Significance level. In our alternate hypothesis, we try to mention that there is a relation between the variables because of which we introduce the term Significance level which identifies the relation between the variables. This value can range anywhere between 0.01 and 0.05.

I shall be placing reference links for understanding the actual examples since it is a very commonly asked interview question for most ML or data science interviews.

From the table of observed values, using a given fixed formula, we obtain a table of expected values which would be calculated based on the values of observed table and the significance level. This table is then used to observe the standard deviation (conceptually) from the actual values.

Chi² value/formula

The overall average of the standard deviation will give us the chi² value. Once this value is obtained, we need to calculate the degrees of freedom of the given table. , the degrees of freedom can be said to be the number of cells you need to fill in before, given the totals in the margins, you can fill in the rest of the grid using a formula. You can see the idea intended; if you have a given set of totals for each column and row, then you don’t have unlimited freedom when filling in the cells. You can only fill in a certain amount of cells with “random” numbers before the rest just becomes dependent on making sure the cells add up to the totals. Thus, the number of cells that can be filled in independently tell us something about the actual amount of variation permitted by the data set.
For example, the degrees of freedom for a Chi-square grid are equal to the number of rows minus one times the number of columns minus one: that is, (R-1)*(C-1). In our simple 2x2 grid, the degrees of independence are therefore (2–1)*(2–1), or 1! Note that once you have put a number into one cell of a 2x2 grid, the totals determine the rest for you.

The tabular chi² value is calculated using the stats table. This is another procedure which you can read of google or just youtube it to understand how they work. Once this value has been calculated, the tabular and calculated values are compared to see which one is greater.

Whenever the calculated (chi² value > tabular chi² value), we come to the conclusion that the null hypothesis has been rejected. This would give us an obvious understanding about the alternate hypothesis. I have attached the link to another tutorial below and I would recommend to check it out.

Thanks for reading. Keep Learning.

Cheers.

--

--