Application on Kruskal-Wallis Test

Indula Kulawardana
Sep 1, 2018 · 3 min read

In real world, data is coming from various domains. You may need to obtain meaningful inferences from data. In statistics, various statistical techniques are available for making inferences. Mainly, two approaches named as parametric and non parametric can be introduced. Parametric approaches are based on strong assumptions on distribution of data whereas non parametric approaches are based on weak assumptions about nature of data. Although parametric tests are more powerful than non parametric tests, mostly non parametric approaches are appropriate to apply for real world data because they are free from distributional assumptions.

When considering non parametric statistical method, it should satisfy at least one of the criteria pointed out below.

  • The method is applicable on Nominal scale data (Examples for Nominal scale data are Gender (i.e.: Male, Female) and Hair color (i.e.: Brown, Black, White))
  • The method is applicable on Ordinal scale data (Examples for Ordinal scale data are time of day (i.e.: Morning, noon, afternoon, evening, night) and Agreement (i.e.: strongly agree, moderately agree, agree, neutral , disagree , moderately disagree, strongly disagree))
  • The method is applicable on data with an Interval or Ratio scale of measurement (Examples for Interval scale data are Celsius Temperature, Fahrenheit Temperature and examples for Ratio scale data are Age, Height and Weight.)

Accordingly, it is essential to identify the nature of the data prior to apply statistical test.

In this article, I will explain about a non parametric technique when you need to make inferences from more than two independent samples with equal or different sizes. This test is named as Kruskal-Wallis test and generally, it is applied to find out whether more than two independent samples are drawn from different populations or not.

I have explained below how to apply Kruskal Wallis Test for making inference of a real world scenario.

Scenario: A construction company carried out an experiment to compare the four mixing techniques on the tensile strength of Portland cement. Civil Engineer needs to find out whether there is any indication that mixing technique affects the strength.

Following are the data obtained from the experiment.

Figure 1: Tensile Strength Data

Here, tensile strengths are provided for four mixing techniques and data on each technique is independent. Also, distributional assumptions of the data are not given and sample sizes are small. Hence, it is appropriate to apply Kruskal-Wallis test for testing the Civil Engineer’s claim.

According to the Engineer’s Claim, hypothesis for this test is defined as follow.

Null Hypothesis: There is no difference in four mixing techniques.

Alternative Hypothesis: There is a difference in four mixing techniques.

Mention below is the algorithm used for Kruskal-Wallis test in Python.

Figure 2: Algorithm used for Kruskal-Wallis Test

Also, you can get the python code through the following link.

https://raw.githubusercontent.com/Indkul/Non-parametric-test/master/Kruskal%20Wallis%20Test.py

You can run the above code on python and view the following results.

Figure 3: Results of Kruskal-Wallis Test

If you need to apply this algorithm (Figure 2) for your own data set, as I applied the kruskal_wallis_test function for Tensile Strength data, you can provide your data frame name for df, numerical field name for column1_name, categorical column name for column2_name and significance level (i.e: 0.1 / 1 / 5 / 10) for alpha.

In Figure 3 Table, you would provide Test Statistic, Critical Value and P Value. Based on these values, decision should be made. Generally, if Test statistic is greater than or equal to critical value, we can decide to reject null hypothesis at a selected significance level (say 5%).

According to Figure 3: Results of Kruskal-Wallis Test, Test Statistic is greater than Critical Value. Hence, we can reject the null hypothesis at 5% level of significance. It concludes that there is a difference in four mixing techniques. Accordingly, we can say that the Civil Engineer’s claim is true.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade