Understanding Goodness of Fit

6 min readMay 16, 2020

What is the goodness of fit?

A goodness-of-fit is a statistical technique, it is applied to measure “how well do the actual(observed) data points fitting into a machine learning model”. It summarizes the divergence between actually observed data points and expected data points in context to a statistical or machine learning model.

Assessment of divergence between the actually observed data points and model-predicted data points is critical to understand, a decision made on poorly fitting models might be badly misleading. A seasoned practitioner must examine the fitment of actual and model-predicted data points.

Why do we test the goodness of fit?

Goodness-of-fit tests are statistical tests with an objective to determine whether a set of actual observed values match those with predicted by the model. Goodness-of-fit tests are frequently applied in business decision making. For example, the below image depicts the linear regression function, the goodness-of-fit test here will compare the actual observed values denoted by blue dots to the predicted values denoted by the red regression line.

What are the most common goodness of fit tests?

Broadly the goodness of fit test categorization can be done based on the distribution of predictand variable of the dataset.

1. The chi-square.

2. Kolmogorov-Smirnov.

3. Anderson-Darling

The Chi-Square Goodness of Fit Test

Chi-square goodness of fit test is conducted when the predictand variable in the dataset is categorical. It is applied to determine whether sample data are consistent with a hypothesized distribution.

Chi-Square test can be applied when the distribution has the following characteristics

The sampling method is random.
Predictand variables are categorical.
The expected value of the number of sample observations at each level of the variable is at least 5. It requires a sufficient sample size for the chi-square approximation to be valid.

Merits of the Chi-square Test

A distribution-free test. It can be used in any type of population distribution.
It is widely applicable not only in social sciences but in business research as well.
It can be easy to calculate and to conclude.
The Chi-Square test provides an additive property. This allows the researcher to add the result of independence to related samples.
This test is based on the observed frequency and not on parameters like mean and standard deviation.

The Chi-square test for a goodness-of-fit test is

Where:
Oi = an observed count for bin i
Ei = an expected count for bin i, asserted by the null hypothesis.

The expected frequency is calculated by:

where:
F = the cumulative distribution function for the probability distribution being tested.
Yu = the upper limit for class i,
Yl = the lower limit for class i, and
N = the sample size

Applications of Chi-square as the goodness of fit

The Chi-square is applied to establish or refute that a relationship exists between actual observed values and predicted values. The chi-squared test is a very useful tool for predictive analytics professionals. It is used very commonly in Clinical research, Social sciences, and Business research.

The Kolmogorov-Smirnov Goodness of Fit Test

Andrey Kolmogorov and Vladimir Smirnov, two probabilists developed this test to see how well a hypothesized distribution function F(x) fits an empirical distribution function Fn(x).

A test for goodness of fit usually involves examining a random sample from some unknown distribution to test the null hypothesis that the unknown distribution function is, in fact, a known, specified function. The Kolmogorov-Smirnov Goodness of Fit Test (K-S test) compares the dataset under consideration with a known distribution and lets us know if they have the same distribution. It’s also used to check the assumption of normality in Analysis of Variance.

The above figure depicts the KS-test quantifying a distance between the empirical distribution of the function of the sample and the cumulative distribution function of the reference distribution.

Kolmogorov-Smirnov(KS-test) test can be applied when the distribution has the following characteristics

The Predicted variable is continuous.

Merits of Kolmogorov-Smirnov(KS-test)

It does not make any assumptions about the distribution of data.
It is widely applicable not only in social sciences but in business research as well.
There are no restrictions on sample size; Small samples are acceptable.

The K-S test for a goodness-of-fit test is

This test is used to decide if a sample comes from a hypothesized continuous distribution. It is based on the empirical cumulative distribution function (ECDF). Assume that we have a random sample x1, … , xn from some continuous distribution with CDF F(x). The empirical CDF is denoted by

The Kolmogorov-Smirnov statistic (D) is based on the largest vertical difference between F(x) and Fn(x). It is defined as a

H0: The data follow the specified distribution.
HA: The data do not follow the specified distribution.

The hypothesis regarding the distributional form is rejected at the chosen significance level (alpha) if the test statistic, D, is greater than the critical value obtained from a table.

The Anderson-Darling Goodness of Fit Test

The Anderson-Darling is tested to compare the fit of an observed cumulative distribution function to an expected cumulative distribution function. This test gives more weight to the tails than the Kolmogorov-Smirnov test.

This test is an enhancement of Kolmogorov-Smirnov. It is more sensitive to deviations in a distribution’s tails. Like the K-S, this test will tell you when it is unlikely that you have a normal distribution and is normally run using statistical software.

Also Read: The Ultimate Guide to AdaBoost Algorithm

Anderson-Darling(A-D test) can be applied when the distribution has the following characteristics

Anderson-Darling tests are proposed for the continuous as well as discrete case.

Merits of Anderson-Darling (A-D test)

It does not make any assumptions about the distribution of data.
It is widely applicable not only in social sciences but in business research as well.
There are no restrictions on the sample size. Small samples are acceptable.

The A-D test for a goodness-of-fit test is

The Anderson-Darling statistic (A2) is defined as