Selecting the right statistical test for our requirement.
What is the purpose of doing the statistical test?
A statistical test provides a mechanism for making a quantitative decision about the process and it allows us to make sense and interpret a great deal of information and it gives numerical evidence to draw valid conclusions from the test results. Using statistical analysis, we can determine the likelihood that a hypothesis should be either accepted or rejected. Most statistical tests are conducted under the assumption that measurements in the underlying population follow some known distribution. The reason for doing a statistical test is to find solutions for predictive function-based data.
Steps involved in performing a statistical test:
1. Framing hypothesis.
2. Identification of statistical test.
3. Finding the test statistic (stats value) and probability value (p-value).
4. Interpreting the test results.
Framing hypothesis: — Assumptions are made without seeing the data. Types of hypothesis a) Null hypothesis(ho) b) Alternate hypothesis(ha/h1). Hypothesis testing provides a method to reject a null hypothesis within a certain confidence level. But the reason why we reject the null hypothesis is that, if we accept the null hypothesis the independent features do not have any influence on the prediction of the target variable. The alternative hypothesis proposes that there is a difference.
Identification of statistical test
This is the most important step to choose the right statistical test for our variables.
What is the significance level?
The default significance level (α) is 0.05 indicates a 5% risk of concluding that aims to quantify evidence against a particular hypothesis being true.
· If p value> α (accept Ho)
· If p value< α (reject Ho)
How do you find the normality of your data?
We can check with the help of Shapiro–Wilk test or Jarque-Bera Test. If the p-value is less than the significance level, then ho is rejected else ha/h1 is selected.
Certain assumptions are made while doing a non-parametric test. The non-parametric test is done when data is not normally distributed.
Ho: skew=0 (or) p value> significance level(Data is normal)
Ha(or) h1: skew!=0 (or) p value≤ significance level (Data is not normal)
Interpreting the test results
If Alternate Hypotheses are selected then it will be useful in predicting the output.
Ha H1: variable_1(mu1) != variable_2(mu2)
(or)
p-value ≤ significance level
If a Null Hypothesis is selected then it will not be useful in predicting the output
Ho: variable_1(mu1) = variable_2(mu2)
(or)
p-value > significance level
Footnotes
Yes, it’s important to do statistical analysis before jumping into the Machine learning algorithms.
Hope you gained some useful insights.
Thanks for reading. :)
And,💙 if this was a good read. Enjoy!