Important Questions on Statistics — Part 2

In this blog, I will try to add Statistics Questions that were asked in interviews (Part 2)

Sumeet Agrawal
6 min readJan 8, 2022

For Part 1 (Question 1 to 6) refer to this linkImportant Questions Part 1

Ques 7) What do you mean by Sampling Bias?

Ans) Before explaining Sampling Bias, let’s first understand what is sample and sampling.

A sample is a subset of individuals from a larger population.

Sampling is a technique of selecting individual members or a subset of the population to make statistical inferences from them and estimate characteristics of the whole population.

Sampling bias is a bias in which a sample is collected in such a way that some members of the intended population have a lower or higher sampling probability than others. It results in a biased sample of a population in which all individuals, or instances, were not equally likely to have been selected.

Ques 8) What do you understand by Type 1 and Type 2 Error?

Ans) Type 1 Error - When the null hypothesis is true and you reject it, you make a type I error.

The probability of making a type I error is α, which is the level of significance you set for your hypothesis test. An α of 0.05 indicates that you are willing to accept a 5% chance that you are wrong when you reject the null hypothesis. To lower this risk, you must use a lower value for α. However, using a lower value for alpha means that you will be less likely to detect a true difference if one really exists.

Type 2 Error — When the null hypothesis is false and you fail to reject it, you make a type II error.

The probability of making a type II error is β, which depends on the power of the test. You can decrease your risk of committing a type II error by ensuring your test has enough power. The probability of rejecting the null hypothesis when it is false is equal to 1–β. This value is the power of the test.

Ques 9) From Type 1 and Type 2 error, which one is more worse?

Ans) Type 1 and Type 2 error completely depends on the scenario or situation. Sometime Type 1 error is more worse and sometime Type 2 error.

The two types of error are inversely related to each other, decreasing type 1 errors will increase type 2 errors, and vice versa. To decide when a type 1 or type 2 error would be safer, let’s go through a couple of scenarios.

Scenario 1 — Assume you’re a member of a jury tasked with deciding whether or not someone should be sentenced to prison for a crime. If they were truly innocent, Type 1 error would suggest that you would imprison them. While Type 2 error here means that someone has actually committed a crime and the jury is letting them get away with it. In this case Type 1 error is worse than Type 2 error.

Scenario 2 — Example of a medical situation. A patient with migraine headaches is referred to the doctor for an MRI head scan. Here in this case Type 1 error would be doctor said that there is brain tumor but in actual there is no tumor to person. Type 2 error would be there is a brain tumor in the patient, but the doctor insists that there is nothing wrong with them. So in this scenario Type 2 error would be more worse than Type 1 error.

Ques 10) Explain Correlation and Co-variance.

Ans) Correlation show whether and how strongly pairs of variables are related to each other. Correlation takes values between -1 to +1, wherein values close to +1 represents strong positive correlation and values close to -1 represents strong negative correlation.

If there is no relationship at all between two variables, then the correlation coefficient will certainly be 0.

Covariance - It signifies the direction of the linear relationship between the two variables. By direction we mean if the variables are directly proportional or inversely proportional to each other.

It can take any value between -infinity to +infinity, where the negative value represents the negative relationship whereas positive value represents the
positive relationship.

Ques 11) Explain Confidence Interval.

Ans) Confidence intervals measure the degree of uncertainty or certainty in a sampling method. They can take any number of probability limits, with the most common being a 95% or 99% confidence level.

A confidence interval displays the probability that a parameter will fall between a pair of values around the mean. Confidence intervals provide more information than point estimates. By establishing a 95% confidence interval using the sample’s mean and standard deviation, and assuming a normal distribution as represented by the bell curve, the researchers arrive at an upper and lower bound that contains the true mean 95% of the time.

Let’s understand confidence interval with an example. Suppose, a group of researchers is studying the heights of high school basketball players. Assume the interval is between 70 inches and 75 inches.

So, 95% confidence interval means that if the researchers take 100 random samples from the population of high school basketball players as a whole, the mean should fall between 70 and 75 inches in 95 of those samples.

Ques 12) Explain the p-value.

Ans) A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance (i.e. that the null hypothesis is true).

The level of statistical significance is often expressed as a p-value between 0 and 1. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.

Some important points of p-value are-

  • p-value less than 0.05 (typically ≤ 0.05) is statistically significant. It indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct. Therefore, we reject the null hypothesis, and accept the alternative hypothesis.
  • p-value higher than 0.05 is not statistically significant and indicates strong evidence for the null hypothesis. This means we retain the null hypothesis and reject the alternative hypothesis.

These were the major questions which I faced on different interviews for data Science. Hopefully, these questions would be helpful to you. Thanks for reading this blog. If you’ll get any more questions on statistics on interviews apart from this then please write in comments.

Also check this For Part 1 (Question 1 to 6) refer to this linkImportant Questions Part 1

Follow on LinkedIn

--

--