HYPOTHESIS TESTING

Published in

Analytics Vidhya

5 min readApr 30, 2020

Hello, my fellow people. Every initiation to start off is a great step forward. Hereby, let’s go through few initial hypothesis required for a Data Scientist enthusiasts to move ahead in the field of Data Science.

Hypothesis: It is a methodology or precisely a statistic method to determine the nature of a data and to make assumption from those populated data for business needs, where assumptions are made using probability factors.

We will start of with Likelihood:

Likelihood: It states, that there are chances for something to happen for a given set of parameter values from the observed values or probability of a event ( a set of success ) occur by knowing the probability of a success occur. It’s more likely a synonym of Probability

Formula: L(o) = NCn * p^n * (1-p)^f

Where O = likelihood, p = probability of success, N = Total number of events, n = no of success outcome from the total number of events, f = no of failure outcome from the total number of events.

Example: Roll a dice, the probability of success to get 3 is 1/6 (0.16). Now, roll the dice 4 times, out of which we observed that 3 times we got 3 on the dice(success outcome) and 1 failure outcome. Now likelihood would be,?

DETERMINING LIKELIHOOD:

Highly Likelihood: Implies it is quite possible that a sample mean is closer to that of the underlying population mean simply because of random chance.

Low Likelihood: Implies it is very unlikely that a sample mean is closer to that of the underlying population mean simply because of random chance.

Framework of Hypothesis Requirements:

Hypothesis testing is used to assess the plausibility of a hypothesis by using sample data.

Null Hypothesis (H⁰) : It states that there is no difference of certain characteristics between the sample and population data due to random chances. Eg: Assuming the researcher’s predictions are true.
Alternate Hypothesis (H1): Alternate of Null Hypothesis, It states that there is difference of certain characteristics between the sample and population data due to random chances. Eg: Assuming the researcher’s predictions are not true.

P - Value: It is the probability of outcomes more extremes than the observed outcome assuming the null hypothesis to be true. Here extreme means significance level.

Significance level: It is a criterion used for rejecting null hypothesis. Usually the alpha is set to be 5% or 1%, that is probability factor to be 0.05 or 0.01. The significance level varies depending on the business problem statement.

Confidence level: It is 1-significance level, used to show how confident you are about your conclusion.

p-value < significance value, we reject the null hypothesis
p-value > significance value, we fail to reject the null hypothesis

Type I and Type II errors in hypothesis:

Type I error (denoted by alpha) : It is also known as a “false positive”, the error of rejecting a null hypothesis when it is actually true.

Eg: Concluding that there is a difference between the two groups(populated and sample) when actually there is no difference. Null hypothesis incorrectly rejected

2. Type II error (denoted by beta): It is also known as a “false negative”, the error of not rejecting a null hypothesis when the alternative hypothesis is actually true.

Eg: Concluding that there is not difference between the two groups(populated and sample) when actually there is a difference. Null hypothesis not rejected when its facts are false.

Example:

A manufacturer claims 2 out of 5 people prefer Italian scotch than gin. A random sample of 25 people results in 4 people preferring Italian scotch, is the manufacturer claim justified?, test at 95% confidence.

Here,

Z- Score and Z-Test:

Z -Score: A Z-score is a numerical measurement used in statistics of a value’s relationship to the mean (average) of a group of values, measured in terms of standard deviations from the mean. If a Z-score is 0, it indicates that the data point’s score is identical to the mean score.

Z -Test: It is a test statistic assumed to have a normal distribution. Standard deviation should be known in order for an accurate z-test to be performed. It is used when sample size is greater than 30.

Conclusion:

There are many more types of hypothesis test such as, Directional test, Two tail test and Multiple sample test which would be covered in future learning.

Here, we overviewed across initial knowledge required on exploring the hypothesis on any populated data. Thankyou and best wishes to you fellow people for reading this article. Appreciate it!