Secrets to a Successful Data Science Interview

Published in

Walmart Global Tech Blog

13 min readAug 15, 2019

So, you have a data science interview? Congratulations! But are you worried about being rejected? Are you puzzled as to what to prepare to confidently face the interview? In this article we’ll share what we’ve learned the hard way from our experience as interviewees as well as interviewers.

Being interviewed in Data Science, a field that is already wide, and rapidly expanding, is as much a challenge for candidates as it is for interviewers. Interviews try to assess your Data Science competency in a few one-hour rounds, while your learning has been many years in the making. This puts interviewers in an unenviable position, namely, how to assess someone very passionate and knowledgeable — like you?

How It Starts

It all starts with what you have written in the resume. Expect questions to be asked based on anything that is related to Data Science from your resume. It is understandable that they are deep in your past (more than three years) but don’t be fazed. We believe you have it in you to give a good account of yourself and anything on your resume. Please revisit it and get reacquainted with your past. An interviewer may ask you a question, for example, “Given that you solved something with method X in the past, do you have a better method Y if you were to solve it now?” Also, ask yourself: Am I writing something on my resume just to impress the interviewer, but hope they don’t ask questions about it? Do not do this! If you are the interviewer, would you think it’s fair? This applies to tech blogs and GitHub links in the resume as well.

Problem Solving: Open & Close-ended

Your data science depth will be assessed based on your ability to solve problems. Such problems could be:

Given a requirement, how to solve it. For example, your business is losing customers, find causes and solutions.
Given a solution, could it be improved? For example, An existing solution has a “conversion rate” of 90%, how would you make it 91%?

Are you able to reuse your past approaches and make them work in a new and different context? Are you also in a position to rethink your old approaches in the context of what you have recently learned?

These are open-ended questions, they may not have complete solutions within the scope of the interview. Remember, these questions are not for getting billion-dollar startup ideas or patentable ideas from you for free. Far from it, these are purely for assessing your problem-solving skills. In fact, some of them may have resonance in data science problems the interviewers are trying to solve as part of their work.

Next, here’s a sample of close-ended questions: derive backpropagation for CNN, explain gradient boosting, etc. The former helps the interviewer assess your reaction to unplanned circumstances, while the latter establishes a baseline filter rooted in standard understanding. Strive your best to ace both. Practice is the key.

Machine Learning

Data Science uses Machine Learning as one of the key techniques. Yes, it may also use neuroscience, behavioral economics, game theory, statistical mechanics, complexity theory, non-Euclidean geometry and myriad areas you are an expert in. However, problems you are expected to solve in the industry context require a sound knowledge of ML techniques as the primary skill. Interviewers will be delighted if you are an expert on other topics, but please be an expert in ML as well.

This is how they will assess you on your ML skills.

Topics covered in standard Machine Learning courses and books: CS229 (Stanford), CS4780 (Cornell), 6–034 (MIT), PRML (Chris Bishop), Mining of Massive Datasets. Note: PRML is a massive book. If its size causes you anxiety, rest assured, PRML is indicative. We don’t mean you must go through the whole book. You should reinforce course content with readings from standard books whenever possible.
ML methods that are mentioned in the resume, but not as part of the courses or books.
Mathematics for ML: Linear Algebra, Probability & Statistics, Multivariable Calculus, basics of optimization (typically these are covered in standard ML courses).

Deep Learning

You are a deep learning ninja. Despite your strong feelings, why do we still think you need to have a sound understanding of “conventional” Machine Learning? The “black-box” nature of deep learning models (you may entirely disagree) makes it difficult for the interviewers to know what you did versus what the model did. However, ML in its conventional form seems to be a good common denominator for all candidates. How does it look to be a person who has built a bi-directional LSTM but doesn’t know how SVM’s work? Please go ahead, dazzle the interviewers with your DL skills, but after you’ve proven a point or two with your ML chops.

You should know why you chose one model/algorithm/approach/architecture above others. Hyperparameters play a key role in DL. Develop a sound understanding of model tuning. The difference between a good model and a not-so-good one may lie in your choice of hyperparameters.

We recommend reviewing the below material before interviewing:

Deep learning specialization (set of 5 courses) offered by deeplearning.ai at Coursera.
Geoff Hinton’s Neural Networks course from Coursera for a deeper understanding of concepts.
A comprehensive deep learning course that covers encoders, at http://www.cse.iitm.ac.in/~miteshk/CS7015.html.
For the long term, you may go through the “Deep Learning” book by Yoshua Bengio and Ian Goodfellow.

Note: We didn’t say you must go through all of these. Acquiring familiarity with the subject by perusing high-quality content gives you confidence that is transferable across situations.

Experience with Tools

Companies give priority to candidates who not only articulate good data science solutions but also can efficiently implement using the right tools. It will be great if you can make yourself comfortable with the tools available in the industry. You should be at least aware of python or R from the language point of view, scikit learn library for ML algorithms, and Keras, tensorflow, pytorch, and caffe for deep learning. Do not neglect querying languages, for example HQL, SQL and distributed frameworks like hadoop and spark. Good organizations have training programs for all these skills. Selected candidates often go through these training courses. Familiarity with the above tools gives your candidature the much-needed edge. By all means, undergo training after you get selected. This will give exposure to the problems the organization solves and the opportunity to interact with other scientists in the organization. These are very important experiences to acquire when new.

Productionization and Deployment of Models

Interviewers expect you to know how a machine learning model is used, how it is productionized and deployed, and the overall end-to-end architecture of your model. It is very difficult to hire a person who does not know how their model will be used by the client/service/customer/product. Without deployment, what you have built is a proof of concept. At the very best, it works on your laptop for demo purposes. For your ML model to be useful it needs to be part of a software pipeline. Know how to interact with engineers towards deploying models. Be prepared to roll up your sleeves and get into the act without anyone prompting you. By doing so, you are enhancing your value and standing in the team.

Life Cycle Management of Models

As a great data scientist, you should understand the full life cycle of the ML model. You should know how your model should change when the world it tries to model changes. (This happens more frequently than you may think, for the real world isn’t under your control.) You will decide how frequently your models need to be retrained for them to remain fresh. You may also want to automate the retraining process for saving your precious time. Repurpose the time saved in iterating model building towards improving performance. Create opportunities for yourself to explain how you’ve managed a model through its life cycle.

Debugging ML Models

Debugging is a very important skill in the software industry. Software is highly risky if it cannot be debugged. You should understand your model in depth. You are expected to be aware of the internals of the algorithms you have used. You should know how to do root cause analysis and debug your models. Interviewers expect you to know how you will improve the results when models have high bias or high variance, what you will do to avoid exploding gradients and vanishing gradients, and how you will optimize memory during training, etc.

Why is it still a good idea to reverse a linked list?

While it is true that Data Structures and Algorithms are outsourced to packages, your ability to make inferences from data is aided by your understanding of algorithms and data structures. While interviewers won’t ask you to implement skip lists or balance k-d trees, they would still expect you to understand order complexity of algorithms, be familiar with basic data structures such as linked lists, stacks, trees, hash-tables and heap, and be comfortable with algorithms such as sorting, shortest paths, string processing and the like. In other words, hygiene questions from data structures and algorithms. You may think it is inappropriate to judge you through this lens, but remember you are making it easy for the competition. We’re sure you are more than up to it.

Know the Company

The very fact that you have been called for an interview is an indication that your prior experience has been considered as a potential fit by HR and by the hiring manager. Don’t stop there! It is your responsibility to research and know what the company does and how you can add value — this is a great distinguisher. Why then are we mentioning such a great distinguisher at the bottom of the list? It comes into its own after you have taken care of other aspects of preparation. What is the point in the candidate knowing a lot about the organization if it is not backed by a strong show of fundamentals in the interviews?

What to ask when asked, “Do you have any questions?”

Beware, this is no invitation to cozy up with the interviewer. Be as formal and polite as you’ve been throughout the interview.

This is a great opportunity to display your understanding of how to function as part of a data science team. It’s also an opportunity to direct the interviewer to your strengths that weren’t covered in the interview. For example you might be good at code reviews. You could enquire how code review works. You may highlight your favorite approach, say walkthroughs. You might want to know the dev platforms used. You may ask if there are any open source contributors in the team (this is the time to re-emphasize if you are one). You could ask about the delivery cycle. If the team is distributed across time zone, check how the interactions work. Share your experience working in such teams if applicable.

Ask if it’s OK for you to know at a high level the project(s) the interviewer is part of. These are far better questions to ask than wanting to know about the work-from-home policy or the typical day for the data scientist. Phrase these questions carefully, for you aren’t the interviewer! Of course, it’s best to ask just two questions as the conversation that ensues does not give you more openings than that. So choose them carefully and be guided by the interview context. Most importantly, be a patient listener, that’s a key to memorable conversations.

Note: Don’t try to hypnotize the interviewer by going over the top. As always, be dignified.

Some Helpful Tips

Please get clarification. If you think the question is ambiguous, ask the interviewer clarifying questions. In fact, this is an important skill for data scientists, namely, understanding requirements. The sooner you ask these questions the better, for it saves time, which in turn helps the interviewer to assess you better.
Don’t move discussions away from ML topics. Believe us, it is not a good idea to deflect the conversation, for the following reasons:

This is a bad idea, period (perhaps borderline unethical as well to deflect).
If it works and you get selected, you might be given work related to something that you weren’t prepared to go deep into during the interview.
It doesn’t work in most cases. Interviewers are as smart as you are, and they know when you are trying to avoid or deflect a conversation. You end up creating a poor impression.

It is honorable to admit that you are not very familiar with the topic. You may still want to try based on first principles if possible and also ask clarifying questions. Otherwise, no worries, the interviewer will move to another topic from your resume.

Handling Rejection

In case you do not make it this time, rest assured the interviewers didn’t do it for fun. Companies invest serious time and money in evaluating you, and if you do not make it, it simply means that you need more preparation. You are not rejected, only your application is. Also, consider the fact that, in addition to you, many other smart and capable candidates interview for the position you have applied for. It is in the very nature of the process to select only a few candidates. If it is not you, it is not the end of the world. (Frankly, do not give anyone that kind of power over you.) Note down the questions that were asked of you without any judgment. But don’t analyze yet; allow one or two days to pass for you to regain your composure. Then reflect without rancor how you could be a better candidate next time. Identify opportunities for improvement and put into motion a plan of action, for it is of no use living in the past or in a state of inaction. Companies very much expect you to apply again in the next cycle (check with the HR about the period) — many of us have applied more than once before getting selected. Your HR contact also would strive to provide feedback, but please, please, please do not start a rebuttal chain. Behave like a data scientist in such situations: If the data (interview result) is at variance with your hypothesis (your preparation), move to a better hypothesis. That is, be better prepared next time.

OK, this may sound a little philosophical. However, we think it is practical and important in making you a better candidate and a better person in general. Develop two kinds of self-awareness: Internal and External. The former is about how aware you are of yourself; the latter is about how aware you are of others’ perception of you. It should be clear as to how these two come together in creating a fruitful interview experience for you. We recommend you to go through Insight by Tasha Eurich.

What is a fair interview?

You are perhaps thinking, this seems like a wordy and long list with synthetic friendliness sprinkled all over, which places a lot of demands on me in a contrived manner. That isn’t the intention, and it is not artificial friendliness either (sorry about the wordy part though). You might be wondering what exactly you would get in return as part of a fair interview process. Here’s what we offer to our candidates. You may expect a similar experience from other reputable organizations as well.

Fair assessment

No questions that map to: I have thought up a 10-digit random number, guess it correctly. No, we don’t expect you to give the answer we have in mind; as long as it is a right answer, your answer is as good as ours.
We do not project our personalities on you. For example, I find it easy, hence you ought to find it easy as well.
We do not expect answers the moment the question is completed. It is perfectly fine for us to listen to your silence or loud thinking before you answer.
No trick questions or misleading questions.
What we came to know two days ago, we don’t expect you to know them for a lifetime. Hence, despite having read a bunch of great Medium articles recently that really made us feel like geniuses, we wouldn’t be asking any questions based on them (sniff!).
A couple of bad answers will be more than compensated by a good overall performance.

Questions that have a context in your experience and knowledge

If your resume is rich enough for you to be invited for the interviews, surely it is rich enough for us to confine our questions to its content. Something not in the resume is good enough evidence that it is outside of your expertise. This excludes basic ML concepts.

Courteous treatment

Interviewers introduce themselves.
Interviewers inform you how the interview is structured.
Interviewers check with you if you need refreshments.
Interviewers pay attention to you.
Interviewers will never do anything unethical. For example, pass rude or sarcastic comments about your present or past organizations or educational institutes; or encourage you to breach your NDA’s.
Interviewers will never make you uncomfortable. For example, they won’t browbeat you or make fun of your answers.
Interviewers accompany you to lunch or hand you over to someone else who would take you to lunch.

We would love to hear your feedback. Don’t forget to share your experiences with us on how these tips helped you in cracking the data science interview.

We wish you great preparation. May the Force be with you!

Contact Us

Suresh V. (Principal Data Scientist, Walmart Labs, Suresh.Venkatasubramaniyan@walmartlabs.com
Himanshu (Senior Data Scientist, Walmart Labs),
himanshu6589@gmail.com

Disclaimer: The above views are personal and not to be construed as our organization’s stated position. However, we do expect any professional organization to have a fair selection process that is reflective of the points mentioned above.

Secrets to a Successful Data Science Interview

Written by Himanshu Jain