Oct 3, 2018 · 4 min read

If you have been following Acing AI, I have been regularly posting interview questions for Data Science and AI interviews from some of the top technology companies. I have been asked on countless occasions to post answers to those questions. In this article I have chosen the top 5 algorithm/theory based questions which have been frequent and answered them.

1.What is p-value?

When you perform a hypothesis test in statistics, a p-value can help you determine the strength of your results. p-value is a number between 0 and 1. Based on the value it will denote the strength of the results. The claim which is on trial is called Null Hypothesis.

• Low p-value (≤ 0.05) indicates strength against the null hypothesis which means we can reject the null Hypothesis.
• High p-value (≥ 0.05) indicates strength for the null hypothesis which means we can accept the null Hypothesis
• p-value of 0.05 indicates the Hypothesis could go either way.

To put it in another way,

• High P values: your data are likely with a true null.
• Low P values: your data are unlikely with a true null.

2. How is the k-nearest neighbour(KNN) algorithm different from k-means clustering?

KNN is a supervised learning algorithm used for classification. K-means is an unsupervised learning algorithm used for clustering. KNN is a supervised learning algorithm which means training data is labeled. The goal of KNN is to classify an unlabeled point into. K-means clustering requires only a set of unlabeled points and a threshold. Then the algorithm will take unlabeled points and gradually learn how to cluster them into groups by computing the mean of the distance between different points. The primary difference is that when training data is labeled it becomes supervised learning while when the data is unlabeled it is unsupervised learning.

3. What is overfitting? How do we avoid it?

“Overfitting” is traditionally defined as training some flexible representation so that it memorizes the data and carries that noise resulting in failure to predict in the future. To avoid overfitting:

• Use fewer parameters: Use a simpler model with fewer parameters. This will avoid capturing the noise reducing overfitting.
• Better performance measures: Use canonical performance measures. The problem should dictate which performance measures will provide a better performance.
• Cross-Validation: Cross-Validation techniques help avoid overfitting. Repeated random sub-sampling validation and k-fold cross-validation are good techniques to use depending on the dataset.

To read more: Clever methods to avoid overfitting

4. How do you decide between model accuracy and model performance?

This question is directly related to the accuracy paradox and tests your knowledge in situations when higher accuracy means poor predictions. The accuracy paradox for predictive analytics states that predictive models with a given level of accuracy may have greater predictive power than models with higher accuracy. It may be better to avoid the accuracy metric in favour of other metrics such as precision and recall. Lets take a situation of invalid password attempts prediction. In a hypothetical company, there are extremely fewer cases of invalid password attempts. If we were to use a predictive model for this, a highly accurate model would actually have low/fewer cases of invalid password attempts which would not be that helpful. Hence, sometimes, lower level of accuracy provides higher predictive power.

5. What’s the difference between Type I and Type II error?

Type I error is equivalent to a False positive. Type II error is equivalent to a False negative. Type I error refers to non-acceptance of hypothesis which ought to be accepted. Type II error is the acceptance of hypothesis which ought to be rejected. Lets take an example of Biometrics. When someone scans their fingers for a biometric scan, a Type I error is the possibility of rejection even with an authorized match. A Type II error is the possibility of acceptance even with a wrong/unauthorized match.

To read more: Type I and Type II key differences

Subscribe to our newsletter here. We are building a new course to help people ace data science interviews. Sign up below to join the wait-list!

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.

The sole motivation of this blog article is to provide answers to some Data Science Interview Questions. I aim to make this a living document, so any updates and suggested changes can always be included. Please provide relevant feedback.

Written by

## Acing AI

#### Acing AI provides analysis of AI companies and ways to venture into them.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just \$5/month. Upgrade