Inferential Statistics.

Swapnil Bandgar
Analytics Vidhya
Published in
6 min readMay 29, 2021

· Inferential statistics is work with a random sample of data taken from a population to illustrate and make inferences about the population.

· Inferential statistics are valuable when working with of each member of an entire population is not convenient or possible.

· Inferential statistics help us get to the conclusions and make predictions based on our data.

· Inferential statistics understands the whole population from sample taken from it.

· In Inferential statistics we use a random sample, so we can generalize outcome from the sample to the large population.

· In Inferential statistics, we can calculate the mean, standard deviation, and proportion for our random sample data from population.

The following types of inferential statistics are mostly used and quite easy to interpret:

· Conditional Probability

· Probability Distribution and Distribution function

· Probability

· Regression Analysis

· Central Limit Theorem

· Hypothesis Testing

· T- Test

· Z- Test

· Sampling Distribution

· Chi-square test

· Confidence Interval

· ANOVA (Analysis of variance)

Let’s see few methods which can be used to identify the output depends upon use case scenarios.

Conditional Probability:

Conditional probability works on a particular event Q, given a certain condition which has already occurred, which is R. Then conditional probability, P(Q|R) is defined as,

P(Q|R) = N(R∩Q) / N(R); provide N(R) > 0

N(R): — Total cases favorable to the event B

N(R∩Q): — Total favorable concurrent

Also, we can write as:

P(Q|R) = P(Q∩R) / P(R); P(R) > 0

Probability Distribution and Distribution function:

The mathematical function relating the uncertainty of a random variable is called probability distribution. It is an illustration of all possible outcomes of a random variable and their related probabilities.

For a random variable Q, CDF (Cumulative Distribution function) is defined as:

F(Q) = P {s ε S; Q(s) ≤ q}

Also,

F(Q) = P {Q ≤ q}

Let’s take example P (Q > 4) = 1- P (Q ≤ 4)

= 1- {P (Q = 1) + P (Q = 2) + P (Q = 3) + P (Q = 4)}

Probability:

probability is occurrence of particular event out of all possible events.

example: Tossing coin and finding probability its Tail.

P(T) =1/2

Here, we have done random experiment with all possible outcome as H or T.

Collection for all possible outcome of random experiment is known as sample space.

Here, we perform operation with random variable. In statistics random variable are of two types: Discrete and Continuous variable.

Regression Analysis:

Regression analysis use to find trends in data. For example, if equation has value of y = mx +b which is linear equation on different value of x we can find the value of y, with some error value in it.

Let’s plot both the values on scatter plots and performing linear regression. find below we have added few values of x and y to find then relation between them.

Central Limit Theorem:

Central Limit Theorem illustrate that when we increase the size of sample, the distribution of sample means becomes normally distributed. independent of the population distribution shape. CLT is particularly true when we have a sample of size greater than 30. So, we can state that if we have larger sample size distribution can be normal.

Hypothesis Testing:

In statistics we have number of assumption Hypothesis testing is a on part of it. It follows number of steps and procedure on random sample data to agree on assumptions correct or not.

There are two type of Hypothesis:

Null hypothesis: in null hypothesis we assume that the sample observations are purely by chance. Null hypothesis is denoted by H0.

Alternate hypothesis: The alternate hypothesis is the sample observations are not by chance. alternate hypothesis is affected by some non-random situation. An alternate hypothesis is denoted by H1 or Ha.

Steps of Hypothesis Testing

The process to determine whether to reject a null hypothesis or to fail to reject the null hypothesis, based on sample data is called hypothesis testing. It consists of four steps:

1. Define both the hypothesis, null and alternate hypothesis.

2. Define a study plan to find how to use sample data to value the null hypothesis.

3. perform some analysis on the sample data to create a single number called ‘test statistic’.

4. Understand the result by applying the decision rule to verify whether the Null hypothesis is correct or not.

If the outcome of t-stat is less than the significance level we will reject the null hypothesis, otherwise, we will fail to reject the null hypothesis.

T-Test:

We use T-Test when we have sample size is less than 30 and population is not standard but, Sample has standard deviation.

Z-Test:

A Z-test is applied when we have data is normally distributed. We find the Z-statistic of the sample using calculate the z-score. Z-score is given by the formula.

We implement Z-test when the population mean, and standard deviation are given.

Sampling Distribution:

Sampling Distribution of statistics of sample selected from population is called sampling distribution. When we increase the sample, distribution become normal. When we increase sample size the variability of the sample decreases.

Chi-square test:

Chi-square test is used to compare categorical data. Chi-square test is of two types.

Chi-square test It determines categories sample data is matching with population which is known as goodness of fit.

When we compare two categorical variables to find whether they are related with each other or not it known as test of independence.

Chi-square statistic is shown by:

Confidence Interval:

Confidence Interval is an interval of practical values for our sample parameters. Confidence intervals are used to provide an interval estimation for the sample parameter of interest.

We can find margin of error by below formula using standard deviation and z-table.

Confidence interval having a value of n% indicates that we are n% sure that the actual mean is within our range of Confidence interval.

ANOVA (Analysis of variance):

Analysis of variance test is a use it for if an experiment outcome is significant or not. It is usually used when there are more than two groups, and we have to identify the hypothesis that the mean of multiple populations and variances of multiple populations are equal.

E.g., Engineers from different company take the same coding challenge. We want to see if one company outperforms others.

There are below types of ANOVA test:

· One-way ANOVA

· Two-way ANOVA

As we are done with explaining both the statistics now will identify difference and when we can use those statistics.

--

--

Swapnil Bandgar
Analytics Vidhya

Code is like humor. When you have to explain it, it’s bad. Connect with me on LinkedIn : https://www.linkedin.com/in/imswapnilb