Analytics Vidhya
Published in

Analytics Vidhya

Selecting the right statistical test for our requirement.

What is the purpose of doing the statistical test?

A statistical test provides a mechanism for making a quantitative decision about the process and it allows us to make sense and interpret a great deal of information and it gives numerical evidence to draw valid conclusions from the test results. Using statistical analysis, we can determine the likelihood that a hypothesis should be either accepted or rejected. Most statistical tests are conducted under the assumption that measurements in the underlying population follow some known distribution. The reason for doing a statistical test is to find solutions for predictive function-based data.

Steps involved in performing a statistical test:

1. Framing hypothesis.

2. Identification of statistical test.

3. Finding the test statistic (stats value) and probability value (p-value).

4. Interpreting the test results.

Framing hypothesis: — Assumptions are made without seeing the data. Types of hypothesis a) Null hypothesis(ho) b) Alternate hypothesis(ha/h1). Hypothesis testing provides a method to reject a null hypothesis within a certain confidence level. But the reason why we reject the null hypothesis is that, if we accept the null hypothesis the independent features do not have any influence on the prediction of the target variable. The alternative hypothesis proposes that there is a difference.

Identification of statistical test

This is the most important step to choose the right statistical test for our variables.

What is the significance level?

The default significance level (α) is 0.05 indicates a 5% risk of concluding that aims to quantify evidence against a particular hypothesis being true.

· If p value> α (accept Ho)

· If p value< α (reject Ho)

Statistical test

How do you find the normality of your data?

We can check with the help of Shapiro–Wilk test or Jarque-Bera Test. If the p-value is less than the significance level, then ho is rejected else ha/h1 is selected.

Certain assumptions are made while doing a non-parametric test. The non-parametric test is done when data is not normally distributed.

Ho: skew=0 (or) p value> significance level(Data is normal)

Ha(or) h1: skew!=0 (or) p value≤ significance level (Data is not normal)

Interpreting the test results

If Alternate Hypotheses are selected then it will be useful in predicting the output.

Ha H1: variable_1(mu1) != variable_2(mu2)

(or)

p-value ≤ significance level

If a Null Hypothesis is selected then it will not be useful in predicting the output

Ho: variable_1(mu1) = variable_2(mu2)

(or)

p-value > significance level

Footnotes

Yes, it’s important to do statistical analysis before jumping into the Machine learning algorithms.

Hope you gained some useful insights.

Thanks for reading. :)

And,💙 if this was a good read. Enjoy!

--

--

--

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Recommended from Medium

The Design And Research Of Vertical Search Engine

Analyzing GCS Respondent-Level Data with Python — First Steps

Part 3 Typeless Search: Discoverability

Write Better Stories with this Python Tool

Practical-1 |Practical-2 | Practical-3 | Practical-4 | Practical-5 | Practical-6 | Practical-7 |…

Practical example on data science and aeronautics

IBM Developer Day — Data Science, Machine Learning and AI

The Size of The Scuba Diving Industry

Scuba diving statistics, certifications & dive gear sales: USA, Europe, and Worldwide

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Lakshmi Sruthi

Lakshmi Sruthi

Aspiring Data Scientist

More from Medium

Sample Variance: How does n-1 come?

Non-Parametric Test in Statistics

Visiting Seattle — the data are in!

What is data and why is it important? layman explanation | Data | Importance | type of data