Selecting the right statistical test for our requirement.

Lakshmi Sruthi
Analytics Vidhya
Published in
3 min readApr 15, 2021

What is the purpose of doing the statistical test?

A statistical test provides a mechanism for making a quantitative decision about the process and it allows us to make sense and interpret a great deal of information and it gives numerical evidence to draw valid conclusions from the test results. Using statistical analysis, we can determine the likelihood that a hypothesis should be either accepted or rejected. Most statistical tests are conducted under the assumption that measurements in the underlying population follow some known distribution. The reason for doing a statistical test is to find solutions for predictive function-based data.

Steps involved in performing a statistical test:

1. Framing hypothesis.

2. Identification of statistical test.

3. Finding the test statistic (stats value) and probability value (p-value).

4. Interpreting the test results.

Framing hypothesis: — Assumptions are made without seeing the data. Types of hypothesis a) Null hypothesis(ho) b) Alternate hypothesis(ha/h1). Hypothesis testing provides a method to reject a null hypothesis within a certain confidence level. But the reason why we reject the null hypothesis is that, if we accept the null hypothesis the independent features do not have any influence on the prediction of the target variable. The alternative hypothesis proposes that there is a difference.

Identification of statistical test

This is the most important step to choose the right statistical test for our variables.

What is the significance level?

The default significance level (α) is 0.05 indicates a 5% risk of concluding that aims to quantify evidence against a particular hypothesis being true.

· If p value> α (accept Ho)

· If p value< α (reject Ho)

Statistical test

How do you find the normality of your data?

We can check with the help of Shapiro–Wilk test or Jarque-Bera Test. If the p-value is less than the significance level, then ho is rejected else ha/h1 is selected.

Certain assumptions are made while doing a non-parametric test. The non-parametric test is done when data is not normally distributed.

Ho: skew=0 (or) p value> significance level(Data is normal)

Ha(or) h1: skew!=0 (or) p value≤ significance level (Data is not normal)

Interpreting the test results

If Alternate Hypotheses are selected then it will be useful in predicting the output.

Ha H1: variable_1(mu1) != variable_2(mu2)

(or)

p-value ≤ significance level

If a Null Hypothesis is selected then it will not be useful in predicting the output

Ho: variable_1(mu1) = variable_2(mu2)

(or)

p-value > significance level

Footnotes

Yes, it’s important to do statistical analysis before jumping into the Machine learning algorithms.

Hope you gained some useful insights.

Thanks for reading. :)

And,💙 if this was a good read. Enjoy!

--

--