Most Asked Interview Questions In Machine Learning

Published in

EnjoyAlgorithms

10 min readJan 5, 2023

Machine Learning and Data Science are two trending areas in computer science where people want to build a career. However, the preparation for this interview is non-conventional. If you go through several Machine Learning interview experiences, you will realize that there is a pattern in which these questions are being asked. In this article, we will discuss that pattern in greater detail.

Before jumping into the interview questions of Machine Learning interviews, let's discuss what possible things one can do to reach this interview stage. The total number of persons sitting in ML interviews is increasing tremendously. Hence, we need to do something different to help us stand out in the crowd.

Based on the experiences shared by ML professionals, the only thing that attracts the attention of technical recruiters is your hands-on experience with Machine Learning. One can commonly find information about the course completion, which is good, but that does not reflect that you can solve industrial ML problems. Hence, we advise gaining some hands-on ML experience by implementing online projects to know precisely how the ML system works.

There is a reason for discussing projects because it is the entry for the funnel of ML interview questions. Recruiters generally start asking about the steps involved in any project and finally land on some trivial questions, which will be discussed later in the blog.

Let's divide all the types of ML interview questions into two unique categories:

Project Specific: Starting of the funnel
Generic Questions: a bucket of basics

And the golden rule is: Funnel connects the bucket. Let's discuss the first type.

Project Specific

One needs to ensure that the project's objective and results should be mentioned in the resume. There can be very specific questions about the project's aim, like why this project and what you achieved. We will discuss some of the most asked questions here, and one can generalize and find their version of questions. Generally, interviewers are less interested in what you have "done" in your ML project; clearly mentioning the "achievement/s" will be beneficial.

Let's take an example project and discuss the questions related to that in greater detail. We will use an email spam classification project, and step-wise implementation can be found in EnjoyAlgorithms' Email Spam and Non-spam Filtering using Machine Learning blog.

Why did you choose the Email spam classification project?

Today's era is online, and with the increase in online usage, there comes security. Per the survey, one of the easiest ways of reaching people is via email, as the email addresses are publicly available and even sold on some platforms. Hence, tech giants like Google and Microsoft use better spam categorization algorithms to keep their users safe. I wanted to explore this capability of ML and went ahead with this project.

Explain the pipeline of your project.

I/we (depending on the number of persons involved) collected an open-source dataset from Kaggle to demonstrate the capability of ML to classify incoming emails into Spam and non-spam categories. Once the data was collected, I/we did a thorough analysis and pre-processing. After the data processing, I/we separated the dataset into three sets: Train, Validation, and Test. Finally, we selected the KNN algorithm to fit the model on the train data and checked the performance on the test dataset.

Note: This is the crucial step in the interview and can be considered the node of starting discussions. The interviewer can focus on any steps mentioned in the project pipeline. Hence, one needs to be concise and know every step precisely. One can explore the ML project pipeline in EnjoyAlgorithms' blog Step by Step Pathway to Implement Machine Learning Projects.

Explain the dataset you used for this project.

I/We used an open-source dataset for this project where 57% of all emails belonged to the Spam category, and the remaining 43% were in the non-spam (ham) category. The biggest challenge here was to find/form a dataset that can have significant Non-spam category mails, as people avoid sharing personal emails because of data privacy concerns.

As the dataset is in text format, I/we applied text-preprocessing techniques like word tokenization of the emails, removal of stop-words, stemming the words, extracting relevant features out of it, and finally, encoding text into word-vector format to make machines understand the text.

Note: Now, the questions will go in a very specific zone, like explaining what word tokenization is, what stemming algorithm you used, what techniques you used for word-vector encoding, etc. We would recommend seeing these blogs to find the detailed answers to these questions:

If the project is based on structured data, please have a look at these blogs to find answers to the questions that can be asked on your dataset,

What percentage of data was used for training the machine learning model?

I/We split the processed dataset into three sets: the train containing 80% of the data, validation containing 10%, and the test including the remaining 10%. I/We chose this combination as the model demanded this much dataset to provide the expected performance.

Which algorithm did you use to train using this dataset?

I/We used KNN (K-Nearest Neighbours) for this project.

What is the KNN algorithm?

KNN stands for K-Nearest Neighbor, a supervised learning algorithm that can solve classification and regression problems. It provides a non-parametric solution for our problem statements. It involves the selection of hyperparameter k, which states how many neighbors we want to consider to decide the property of any given sample. KNN is deemed the first ML algorithm and is very popular in the ML industry.

Note: Please read the blog on the KNN algorithm to know more details about this algorithm. Each word spoken in the interview can lead our interview in that direction. One can use this trick to control any interview's direction, which is the art that will come with confidence. For example, while answering what KNN is, we stated that it is a non-parametric solution. We could have skipped this line if we did not know the non-parametric solution.

There are more sophisticated algorithms, so why KNN?

We only know which algorithm will fit our requirements in ML once we try different algorithms. For our project, we started with the most basic algorithm, KNN. Indeed, I/we could have used alternate algorithms like Random Forests and XG-Boost, as they are more sophisticated. Still, these algorithms bring complexity, and I found the explainable nature of KNN more fascinating. Hence, I went ahead with KNN.

What cost function did you use for this project?

As this is a classical binary classification problem, I/we used the standard binary cross-entropy as our cost function for this project.

Note: For ML projects, designing the appropriate cost function is crucial; hence, we must know what exactly we used in our project.

Which optimization technique did you use for this project?

I/We used the default optimization technique of gradient descent in the KNN algorithm. We could have tried playing with different techniques, but GD gave us the desired performance.

What performance metric did you use to evaluate your model?

The email classification problem is a classification problem; hence, accuracy as the evaluation metric was acceptable. But I/we found some research papers that said we can not always rely on accuracy parameters. Therefore, I/We evaluated the model using four evaluation metrics for classification models: Accuracy, precision, sensitivity, and specificity.

Note: Please ensure you know each term mentioned in the above question. If you are still getting familiar with any of the terms or less confident, do not say that you used that metric.

What was the accuracy of your model?

With KNN, I/we achieved 92% accuracy for our model. The reference paper we used claimed to have 95% accuracy, but they used a different algorithm and a custom dataset that they generated with the annotation engineers' help.

Note: This question will definitely come, as that's how we know how good a model is. You need to mention the target you wanted to achieve and what did you achieve.

How can you improve the performance further?

There can be multiple ways to do so. Some of them are:

Use more advanced algorithms, like XG-Boost and Neural Networks.
Use of a broader dataset that can help the model learn more unseen cases.

What was the conclusion of this project?

I/We saw the capability of Machine Learning in providing security from phishing threats, and it did the job smartly. This model could have been improved and can be deployed later to check the performance on similar datasets.

This would be our final question on project-specific categories. It is not true that basic questions will only be asked when the project-specific questions end. Interviewers can use the topics mentioned during the answers to project-specific questions and ask basic related to those topics. Let's list down some standard generic questions as well.

Generic Questions

What are the various challenges that you faced in your project?

One bigger challenge was to get balanced data to provide equal numbers of Spam and non-spam category emails. People avoid sharing non-spam emails but easily share spam emails because of privacy concerns. To resolve this issue, I/we removed spam emails so our model doesn't get biased.

Note: This question is trendy in ML interviews. If you have done something yourself, you must have faced difficulty and a way to resolve that. Interviewers want to know how you tackled the hurdles.

What is the Bias-variance tradeoff?

The prediction error given by any Machine Learning model can be represented as Error = Bias² + Variance + Irreducible_error. To reduce this error, we want both bias and variance lower. But when we reduce the biases in our model, variance increases and vice-versa. This is popularly known as the Bias-Variance tradeoff.

Note: This question is asked in 9 out of 10 ML interviews. Please see the Bias-variance tradeoff blog for a more detailed answer.

What are underfitting and overfitting?

Underfitting and overfitting are two problems that can affect the model's performance. In underfitting, the model does not perform well on training data, or we can say that the model fails to learn complex patterns in the dataset. This problem can be solved by

Increasing the complexity of the algorithm so that it starts learning more patterns.
Engineering more features so the model understands the true relationship between dependent and independent variables.

In overfitting, the model learns too much from training data and needs to generalize the learning on unseen test set samples. Overfitting can be solved by:

Decreasing the complexity of the model as it is learning too much.
Increasing data samples so the model gets exposed to more unseen patterns for better generalizability.
Applying regularization techniques.

Note: This is one of the most critical questions in ML interviews. We recommend seeing this blog to understand the detailed mathematics of how regularization solves overfitting problems.

What is regularization, and can you name some famous regularization techniques?

Regularization is a technique used to reduce the overfitting problem of the model. Some famous regularization techniques are L1, L2, and dropout regularization.

Note: The difference between L1 and L2 regularization and the working of dropout regularization techniques are some of the most frequently asked topics in interviews. Please read the blog on regularization techniques for more details.

What is a cost function in Machine Learning?

A cost function represents the problem statement we want to solve using Machine Learning as an optimization problem. Machines try to reduce the error between the predicted and the actual value for any input sample. We have finite samples to train, which becomes our cost function when we average the errors over all the samples.

Note: Read the blog on cost functions used in Classification and Regression problems to know the details. This question is fundamental and can justify your grip on machine learning.

What is the difference between Supervised and Unsupervised learning?

In Supervised learning, machines know the input and corresponding output and then try to map the function for input and output pairs. Some popular algorithms for Supervised learning are Linear regression, Logistic Regression, Decision Trees, etc.

In Unsupervised learning, machines only know the input and map functions to bring similar samples together. Some popular algorithms for Unsupervised learning are Principal Component Analysis, k-means clustering, etc.

Note: Please see the Supervised vs. Unsupervised Learning blog for a more detailed description.

What is Stemming, and how is it different from lematization?

Stemming is a technique in data pre-processing used for text data. Using this technique, we bring words into their native form by removing common suffixes and prefixes added to the word. For example, Interchanging will be converted to interchang. There are some pros and cons related to this technique; pros: The complexity of these algorithms is less, and hence execution becomes faster. Cons: These algorithms do not preserve the true meaning of the words before slicing.

On the other hand, Lemmatization also reduces the words to their native form, preserving their true meaning. The complexity of these algorithms is higher; hence, it is only used when the dataset is small.

Note: This question is placed here because our data is in textual format, and you can expect more such questions on data pre-processing of text data.

Explain the working of the Gradient Descent Algorithm.

Gradient Descent is an optimization technique used to optimize the cost functions in Machine Learning. Machines try various sets of parameters and try to find which sets of parameters produce the least cost function and store corresponding parameters in the form of learning. GD speeds up the process of finding these sets of parameters using calculus and states that where the cost function gradient is zero, the cost will be either minimum or maximum. But we design the cost function to include only minima; hence, it drives the updates of the parameters in those portions where the gradient approaches zero.

Note: Please read the blog on Gradient Descent for more details, along with the algorithm's Pseudocode.

Incorrect and correct implementation of GD

Conclusion

This article discussed how machine learning interviews progress and how to deal with the questions asked. We discussed how interviewers link project-specific questions mentioned in our resumes to the basics of machine learning. This blog is designed based on several real interview experiences. We hope you enjoyed the article.

Enjoy Learning!