Not able to clear Machine Learning system design Interviews?

A Step by Step framework to crack Machine Learning System Design questions. Most of the candidates ignore this. Hence, they face rejections. In this blog, I have tried to cover the system design part and other areas that people ignore.

Published in

Analytics Vidhya

14 min readJan 6, 2021

I have been in constant touch with aspiring data scientists, freshers (with experience under1–2 years), or someone interested in transitioning to the data science field. Recently, many of these people have reached out to me for the ML interview. I have shared my experiences and knowledge with a few of them. Later, I started getting many requests, and explaining to everyone one by one was not feasible. Hence, I thought to write a quick but little detailed blog on the same. The main focus of this blog is the system design consideration for ML, however, I have tried to cover other aspects too.

Based on my discussion with various interviewers, interviewees, and my first-hand experience in the industry, I like to categorize ML interview questions into three components.

Models related questions
Deployment related questions
System Design related questions

Please remember use cases will be embedded in each category. I categorize questions in this way because it helps in preparation. Honestly, System Design is not a separate category. It runs in parallel, from the starting till the use case resolution. Still, I have separated the system design category because, based on your experience in the field, you may or may not be asked the system design questions. The above categories have been arranged in the order of expectation to know these things with increasing experience.

Model-related questions: This is the basic knowledge expected from each candidate be it fresher or experienced. It is the first step. This step involves questions specific to the models. It can vary from generic questions such as what’s the difference between loss and cost function? What do you mean by bias and variance? How is this related to over-fitted and under-fitted models? Or it can be direct how does logistic regression works? Depending upon the job description, and your cv, the question can go to CNN, NLP, etc. If it’s not that advanced level profile, then I have seen BERT and Transformers are generally skipped, and only till LSTM and a bit of GAN they ask. This topic in itself can take one full blog but still not be covered entirely. But the good news is most of the decent courses and YouTube channels available in the market covers this field very well, so I will not go into the details of the same. However, NOT all go into too many mathematical details of the same. As far as interviews are concerned, one can clear the majority of the interviews without going too much into the mathematical depth. I suggest everyone know at least the basic mathematics behind the model and then based on the area you want to grow — go in-depth. Don’t waste too much time to know each and everything about a model or each model available, especially if you are a fresher or new to ML. Take one model and find out use cases related to it, and then do the real-life problems from your surroundings or you can also take from Kaggle. But if you can find a real-life problem, that’s the best.

Remember why we are doing this project in the first place? There are some pain points of users and we are using ML to provide a solution for the same. But a model just built in a silo won’t resolve their pain point.

Deployment Related questions: Once you have gained proficiency in a few models and built a few use cases, it’s time to deploy the same in production. It’s always good to know this part even if you are a fresher because nowadays competition is increasing. Currently, most of the courses are missing this topic in their curriculum. I would not name any course that does or doesn’t. I may write another blog to review the few which I have undergone. But as a learner, you must think once you have built a model what’s next? Generally, most of the course teaches on Jupyter — which is very good for learning but do we use that in production? While talking to one of the interviewees, when I asked what tool he/she used to write the code? The answer I got was Jupyter! It’s really hard to believe, one can deploy some model in production in Jupyter which has various modules. If you are thinking why? Then it means you need to work on this part. The best way to prepare for this is to think from end to end. Let’s say, you have to work on a recommendation system not necessarily the Netflix recommendation but it can be any recommendation system. This is not the main theme of this article so will not go into details but try to think from the point where the client/product manager provides you the requirements. If you have to build a model you need to know what’s the client expectations are and on what criteria your model will be evaluated and will be termed successful or failure. Usually, a product manager takes care of these things. He/she will communicate the same to you and make sure these criteria have been met before delivering the model to the client or putting it in production. Most of the students, interviewees think or reply, the next step will be model development and then deployment. Depending upon your total experience in the IT Industry this may or may not fly as an answer. Because a very important step is the system design which is the main theme of this article. So, we will come back to it. Once you have a design in place and built a model, and tested it — is your work complete? Many times, people become limited to the model building but remember why we are doing this project in the first place? There are some pain points of users and we are using ML to provide a solution for the same. But a model just built in a silo won’t resolve their pain point. We need to provide this solution to the end-user that means we have to deploy it somewhere. There can be various scenarios but what I wanted to bring attention to is that don’t just stop at model development, think about deployment too. It is a very crucial part. I will try to write a separate article on Data Science in production. There are various videos and article available for free which talks about the model deployment. I started with AWS — AWS gives you 1-year free tier access. The free tier has some limited access however, it’s sufficient to do PoCs (proof of concept). Play around with this, it will give you extra exposure to AWS cloud, which adds to your cv too. In addition to AWS, Azure, GCP, Heroku are few others where you need to focus. But don’t try to master all at once. You may take AWS and explore it in depth but make sure you have a basic idea of deploying a model in the other platforms. You can mention in the interview — mostly I have worked in AWS however I have basic knowledge of GCP and if the job demands GCP mastery, I can easily get to the speed. Docker and Kubernetes are other important topics when we talk about deployment. I will repeat DON’T TRY TO MASTER ALL AT ONCE, especially for freshers, don’t spread yourself too thin. Gain some depth in certain areas or model, that way you can always showcase your learning capabilities. Interviewers do know, NO ONE can know everything in depth.

Machine Learning System Design

An interviewer will generally ask you to design an ML system for a particular use case. It can be anything, such as the recommendation model, Relevant Ads, Extracting information from a given corpus of documents. As we discussed earlier, there are numerous resources in the first category and quite a few in the second category however, I hardly see resources on the ML system design with a step-by-step process. I will try to summarize how to approach ML system design questions or ML system design in general. As this is specific to interview preparation, I will focus more on that side. Below are the seven broad steps in which I usually prefer handling this:

1. Understanding and Clarifying the Problem

2. Requirement Understanding for Scale and Latency.

3. Metrics of Interest

4. Creating blueprint — architecture design

5. Scale Architecture

6. Model building and evaluation

7. Iterative model improvement

This framework gives a good starting point. It’s not hard and fast and one can choose what best suits them. Let’s take an example to go through each step. Consider interviewer has asked you to “Build a system to show relevant Ads to users”. In my next blog, I will take this use case and go through all these steps one by one in DEPTH. Here, I will briefly cover all the steps else this blog will become too long and most people will abandon reading in between.

For a Tech Product Manager or Data Science manager this is one of the key skills — to convert business/user requirement in a ML problem statement.

1. Understanding and Clarifying the Problem: Not only in the ML system design but in any interview, the starting question will be very broad and vague. We need to ask questions to make it more specific. This is how it works in real life too. If we are working with a client or working internally to develop a feature, in the starting it will be very broad and vague. Client, PMs, architect/data engineer/scientists, etc. sit down and refine it to reach a final requirement, system design, and the solution. Usually, it’s an iterative process and input flows both ways from PMs to Engineering and vice versa. For our example, you may start with a very basic question if you have no clue about digital advertising — what Ads we are talking about here? Are these the Ads displayed by FB, Insta, Twitter, etc. where we have information available for the user, or are these Ads showed via a search engine — here we supply a query string, so query string and context becomes important. Based on the interviewer’s response you should be able to define an ML problem statement. For example — “Predict the probability of engagement of an ad for a given user and context”. We can see now, we have a very clear goal defined to work.

Note: For a tech product manager or data science manager this is one of the key skills — to convert business/user requirement in a ML problem statement.

2. Requirement Understanding for Scale and Latency: It’s not a separate thing but while designing the architecture we need to be aware of the load expected. For example, are we expecting 1 million queries/sec? or it’s just an internal search function where a max of 1000 queries/sec is expected. And then we need to clarify from the interviewer what’s the wait time he/she is expecting? We may build a very good model, however, if it’s returning the result in let’s say 5 seconds and the expected time/latency was 1 second then our model will be rejected however good it might be.

3. Metrics of Interest: Metrics can be broadly classified into two categories — offline and online metrics.

Offline Metrics: It’s used to compare models quickly to find out which gives the best result. Which metrics are best to be used for comparison depends upon the type of use case for example in the case of binary classification commonly used metrics are AUC, log loss, precision, recall, and F1-score. For other cases we might need to come up with specific metrics such as NDCG metrics can be handy in search ranking problems. In our case, Log loss can be a good candidate. Generally, people prefer AUC but AUC doesn’t penalize for ‘how far’ the predicted score is from the actual one. Also, AUC is insensitive to the well-calibrated probabilities.

Online Metrics: Once we have selected the best performing models offline, we will use online metrics to test them in the production environment. While coming up with online metrics, we may need both component-wise and end-to-end metrics — that is to say, we may use component wise metrics such as log loss function to measure the performance of our model but we need to make sure how the whole system performance once the model is plugged into it. For that, we can use end to end metrics such as see how the revenue and engagement rate improve before making the final decision to launch the model. This varies based on the use case.

Online Metrics are generally ignored by candidates but it’s more important from the business perspective. Product managers and Clients focuses more on it.

Many times we focus on offline metrics but kind of ignore online metrics as a developer but it’s more important from the PM/client perspective. I have a real-life experience where one CNN model developed was performing very well but when put in production overall time took was drastically increased and we have to make the changes.

4. Creating blueprint — architecture design: Here, we need to have a clear understanding of data flow and domain knowledge also comes in handy. If we don’t know the Ad-Tech domain or how the complete cycle of programmatic Advertising works then it will be difficult for us to design. In real life, we need to interact with the product manager or client to get clarity about each and everything. I have seen many developers, be it data science or general programmers start working on the project without knowing the full end to end flow. As a junior dev, we don’t need to know but if we try to learn these things from the starting it will help us grow quickly. We have seen few people grow quickly but few are still doing the same confined coding tasks — these things bring the differences. Concerning programmatic advertising — I wrote a blog last year, if interested can go through the same — https://medium.com/@smit.srivastava/programmatic-advertising-the-complete-life-cycle-for-the-beginners-c1b0291d01fd

I am not going into the detail of this use case here because if I have to explain the flow that itself will take one full blog. But the flow can be summarized as below:

Advertisers create ads providing targeting information, and the ads are stored in the ads index.
When a user queries the platform, ads can be selected from the index based on their information (e.g., demographics, interests, etc.) and run through our ads prediction system.

There will be different components and processes — Ad Selection, Ad Prediction, Auction, Pacing, and Training data generation.

One important point I want to highlight which many candidates miss. How are we going to generate the training data? During the development phase, we may take a dump from live. However, once we have put the model in live how we are going to train the model? In our case, we need to record the action taken on an ad. The training data generation component records the user action on the ads (displayed after the auction) and generates positive and negative training examples for the ad prediction component.

5. Scale Architecture: Now we have developed the architecture, is our work done? Not yet! A very important point in architecture design is to consider the scale and latency performance that we discussed earlier. If we apply a complex DL model (for example) to the starting data set then the time taken to execute will be huge compared to the expected return time. So, most of the time in such a scenario we prefer going from large data set to a smaller data set gradually. In our example, what can we do? We can use a simple model in the starting for Ad selection based on the targeting criteria and predicting Ads relevance score. Then this can rank Ads and send top Ads to the Ad Prediction system, and this component will use a complex, optimized ML/DL model to predict a precisely calibrated score. This is then passed to the auction component where auction algorithms run and select the top Ad to be displayed. For any use case, think about how can we make it more scalable and reduce latency.

6. Model Building and Evaluation: Model building and evaluation can be categorized offline and online, similar to the metrics.

Offline Model building and Evaluation: This is the most widely known step by everyone. Most of us call it model creation and testing. Anyone starting the preparation starts from this step only (most of the time). There are lots of material available on this topic, one just needs to practice this step for some use cases. A few point, I would like to highlight here are:

Training data — Most of the courses provide us training data in excel but in reality, it doesn’t work like that. So, we need to think about where we are going to get the training data? In courses, the data are labeled but who does the labeling? We have to think through all this — will we use humans to sit down and label data or record user interaction with the system? This might get costly! Do we employ someone or outsource this task? There are various questions to be considered which during a course we don’t think upon! Implementation will not be that straightforward and simple — when we display Ad to the user if the user doesn’t click on the Ad does that always mean it was a negative scenario? In search engine context, if we display let’s say 100 results and the first page has only 20 results — usually what will happen is that the first 20 results will be clicked and the last page results will hardly be clicked. Does that mean those results were completely irrelevant?

EDA, Feature engineering and selection, etc. are mostly handled in the above phase. Features play the role of backbone in any learning system. Train on garbage data or irrelevant features and your model is gone! In our example what features, we can think of? Listing out all is not possible hence just giving an idea — we need to think of all the players involved. One is User, another is Advertiser, then we have ‘Ad’ itself, context (if it’s based on search), sometimes it will be publishers too. Now next step will be to list down the features of each of these players. Then we can select features for our model. Here, domain knowledge plays a very important role. There are lots of material on these phases so not reinventing the wheel but I have highlighted important points for the interview.

Post this go for model building and evaluation which we briefly touched on earlier. Most of the aspiring data scientists think this is the core and main part but as we have seen this is one of the important parts NOT complete work in itself.

Online Model Building and Evaluation: Once we have selected the best model, we will test it in the online environment. As we discussed earlier under the online metrics, based on the results we will either go ahead with the model or come back and rebuild the model.

7. Iterative Model improvements: Our model may perform well but in production, we started noticing, it’s not performing that great or suddenly there is a dip in the performance or we may note it’s failing in some particular scenario. Then we need to go for debugging the model — maybe we find out data distribution in the test is different compared to production or suddenly there is a shift in data distribution, for example, user preferences may change based on time of the year, etc. During this phase usually we find areas of improvement. Sometimes we will find out the issue but we may not have a solution available as of now. Any new tech advancement or some new research paper implementation coming in helps in these kinds of situations.

I tried to cover briefly each of the areas and focused more on the areas which are commonly unlooked or are not that easily available in the market. This in no way is an exhaustive list and is not the only way to approach it. I am not covering the technical implementations of these because there so many good resources available for the same in the market. If you need help with finding out free resources for any specific topic let me know I will try to see how I can help you. My LinkedIn: https://www.linkedin.com/in/smitsrivastava/

I hope, it helps people preparing for the ML interviews.

Not able to clear Machine Learning system design Interviews?

A Step by Step framework to crack Machine Learning System Design questions. Most of the candidates ignore this. Hence, they face rejections. In this blog, I have tried to cover the system design part and other areas that people ignore.

Written by Smit Srivastava