Goodness of Fit Test

CHI SQUARED TESTS

CSTSeries#2— Goodness of Fit Test

Yeran Kods
6 min readJul 19, 2023

--

Introduction

Here we check whether the given discrete/categorical variable follows the “given type” of distribution.

We can apply this test for both discrete and categorical data.

“Used to find whether a set of discrete or categorical data follows a specified distribution.”

Ex: The no. of accidents in a junction. It’s a discrete variable.

Question says no. of accidents in a junction follows Poisson distribution.

We want to check whether the no. of accidents in a junction follows Poisson distribution.

To check that, we can apply the goodness of fit test.

That's what we do in here.

We are going to check whether a given set of discrete data OR whether a given set of categorical data follows a given type of distribution.

This is also a Hypothesis testing.

We are going to perform all the testing in hypothesis testing again.😅

1ˢᵗ step in hypothesis testing is we need to define our hypothesis.

That means the null & alternative hypothesis.

The hypothesis for the test is,

  • H₀: The data are consistent with the specified distribution.

For the null hypothesis (H₀), we are always going to consider the data is consistent with the specified distribution.

Ex: If u consider the no. of accidents example, For H₀ you can assume that the no. of accidents in the junction follows a Poisson distribution.

  • H₁: At least one category deviates from the specified distribution.

For H₁ we assume that the given data set does not follow the given distribution.

Which means there will be atleast one category which deviates from the distribution. (the given variable does not follow the specified distribution)

2ⁿᵈ step for a hypothesis testing is the test statistic.

That means we are going to define an equation to test our null hypothesis.

Always we define this under H₀ (assuming that H₀ is true)

you can see here we have only 1 zigma notation.

which means we are going to talk about only 1 variable at a time (1 categorical variable or 1 discrete variable)

So in these cases, you will not have a 2-way frequency table because we have only 1 variable.

So for all the levels of the given variable, we are going to take the difference between observed count and the expected count.

Then we get the squared of it and divide by the expected count.

  • d.f. = No of classes — No of parameters estimated — 1

Here since we don't have 2 variables to get as rows and columns like in test for association, this is the equation we use to find the d.f

No of classes is the no of levels in your categorical or discrete variable.

Here we are testing whether a given set of data follows a specified type of dis (Poisson, normal, exponential…etc.)

So usually we define those distributions by using the parameters.

ex: Poisson → we have only 1 parameter (lambda / rate of occurrence)

Normal → we have 2 parameters. (mean, variance)

So usually in the problem when they ask to test a particular type of dis , they have to mention it along with their parameters.

So if those parameters are given we are not going to estimate any parameters.

In that case the, No of parameters estimated = 0

But, if those parameters are not given, then we have to estimate that parameter.

If your going to estimate parameters , you have to tell how many parameters you estimated.

Ex: For the Poisson distribution, we have only 1 parameter.(λ)

If it is given we don't have to estimate. In that case no. of parameters estimated = 0

But if the λ value is not given, for the Poisson distribution, then you have to estimate it.

If you are trying to estimate it, then that means you have estimated 1 parameter.

Then no of parameters estimated = 1

3ʳᵈ step is, defining the significance level

Which means we are going to allocate some room for the error.

Since this is also a hypothesis testing we dont get 100% accurate results. so we allocate some room for the error.

So we consider the α% level.

In the question if it is given u can use that value, if not given use the default value which is 5%

4ᵗʰ step is we have to define the rejection region.

we learned earlier than chi-squared has a positively skewed distribution.

We are going to allocate our total error only to the Right Hand Side of the distribution. (That will be your rejection region)

If calculated chi-squared value falls in the rejection region we are going to reject H₀

5ᵗʰ step is calculating the test statistic value.

Use the equation previously defined.

  • Find the expected frequencies for each category.
  • Calculate test statistic value.

To calculate the test statistic value, you have to know the Oᵢ and Eᵢ values.

Observed counts for each and every level will be given in the question.

Expected count, you have to calculate.

Final step

Compare calculated test statistic value with critical value and give the conclusion.

Example — 1

If you solve this question on your own, honey, you’ve mastered the Goodness of Fit Test.😘

A die is rolled 60 times and the face values are recorded. The results are as follows.

Is the die balanced? Test using α = 0.05.

Answer ✅

  • Up Face — Discrete variable
  • Frequency — Observed counts

If you roll a die the possible values you can get is 1, 2, 3, 4, 5, 6

If it is a fair/balanced die, the probability of getting one outcome P(x) is 1/6.

So in this question they are testing a probability distribution.

IMPORTANT ❗

  • Sometimes you will get questions like this also. That means they will not mention a specific type of distribution. But still they are asking to test a probability distribution for the given set of discrete data.
Image 1
Image 2

This is a probability distribution. We don’t have any parameters. (even to estimate we don’t have any)

Image 3
Image 4
Image 5
Image 6
Image 7

Let’s try one more question, shall we? 😎

Example — 2

The number of accidents in a month observed over a period of 10 years is given below.

Is the data following a Poisson distribution? Test using α = 0.05.

Answer ✅

Hold on ! Hold on ! Try the question atleast. C’mon I see you. 😉

  • No of accidents — Discrete variable

And with that we come to the end of the CSTSeries. ✅

Hope you learned something new & cleared out your confusions.

Catch you again soon.❤️

Credits to Ms. K. G. M. Lakmali , Lecturer, Department of Mathematics and Statistics, Faculty of Humanities and Sciences, SLIIT for providing comprehensive explanations on CHI SQUARED TESTS.

--

--

Yeran Kods

Interest is what inspires.🌍 | New articles every week !