ML Checklist — Best Practices for a Successful Model Deployment

Published in

Analytics Vidhya

5 min readJun 26, 2021

Machine Learning model deployment is complex and often the model hence the project’s journey ends before reaching the deployment phase. VentureBeat reports that 87% of the models never make it to production. I believe an ML / DS team can increase the chance of deploying their model by asking these Five questions at various stages of their project.

Machine Learning Infrastructure — source — NIPS 2015, Scully.

1. What business metric do you want to optimize?

Before project scoping, it is important to explore different business metrics common throughout the organization or the domain. These business metrics should act as a bridge between machine learning metrics (ROC — AUC, RMSE, etc.), and business proposition should also directly correlate to business growth. As an instance, YouTube might want to maximize engagement time per user or click-through rate (CTR). On a similar note, a lending firm might minimize the loan delinquency ratio.

Careful selection of metrics gives a broader understanding of your ML system. Also, note that you might choose more than one metric to track. For example, Medium might track engagement time per story as well as clap ratio per story while recommending stories to users.

2. How well heuristic solves your objective?

A machine learning project’s success depends on how well you formulate and assess your problem before the start of the project. Proper and extensive EDA can give major insights into your data. Many times we start ML model development without even thinking about heuristics. I believe it’s better to go with simple heuristics rather than complex black-box models for a good baseline. For example, Zillow can show house prices by making a simple rule on the number of bedrooms and location, and Android can & do list down apps alphabetically instead of recommending at the user level.

Please note that one machine learning model will be better than a set of complex rules in the name of heuristic.

3. How do you want to serve your ML system?

There are multiple ways to serve your ML model, and often you are required to decide the serving architecture before training your model. The two most common architectures are —

Precomputed Model Prediction

This is one of the earliest used and simplest architecture for serving machine learning models. It is an indirect method for serving the model, where we precompute predictions for all possible combinations of input variables and store them in a database. This architecture is generally used in recommendation systems — recommendations are precomputed and stored, and shown to the user at login.

Architecture for Model Serving — Pre computed Predictions.

Even though we don’t directly expose our model, this type of architecture has multiple advantages.

Pros:

Low Latency while Inferencing.
Easy Productionization — Retraining the model and bringing it back to production is very simple and less time-consuming.
Cost-Efficient — A database is all you need, no special infra is required.

Cons:

Compelled to use bounded data — Since we want to precompute predictions for all possible combinations, we need to keep independent variable space discrete and bounded. Hence, can’t use continuous variables directly.
Each new variable is pain — Adding a new variable increases inference latency, storage, etc. exponentially.

Microservice Based Model Serving

Here, the model is served independently of the application, and predictions are provided in real-time as per request. This type of architecture provides flexibility in terms of model training and deployment.

Microservice Based Architecture for Model Serving

Pros:

Real-Time Predictions — Serves RT predictions online and delivers the purpose for many applications.
Deployment is Flexible — Being an independent service, the model can be deployed with ease, can be in-house, on the cloud, or deployment at the user’s end.
Highly Scalable — Model is an independent service and can be scaled independently.

Cons:

Infrastructure Cost — Cloud, GPU, DB, etc. are required hence cost can go high according to the model’s requirement.
Low Latency Optimization — Depending on the model’s complexity, some extra effort in inference optimization is required to minimize latency.

4. When should you retrain your model?

Machine learning model’s performance degrades over time in production, and it is advised to evaluate retraining requirements before model serving. Based on the use case, model monitoring, and evaluation, one can decide when to retrain the model again. One good way to decide on retraining time is to use out-of-time analysis on different time windows.

Here, model performance degrades by 5% after 2 months, so one can decide to retrain every two months.

The above example is just for illustration purposes and depending on the use case the retraining time can range from seconds to years.

5. How do you want to retrain your model?

Retraining is essential, and it helps to keep the model up to date. There are broadly two ways to retrain machine learning models — online & offline training.

Online Training

As the name suggests, the model is re-trained while in production. True labels are circulated back to the model at a certain interval to update/ retrain the model. This requires a separate architecture and is generally hard to implement.

Broad Architecture for Online Model Retraining

As an example, When we predict ad-click probability we can get feedback (clicked or not clicked) which can be used to update the model online.

Offline Training

In Offline training, the model is re-trained from scratch, and we’ve full control over the new model and data to train. The new model is pushed in production using A/B testing or shadow testing.

Conclusion

Deploying a machine learning model is not easy and one needs to take care of multiple things to make deployment successful. I believe brainstorming on these five questions can help to identify issues and can make your machine learning system better.

Thanks for reading.

Peace :)

Reference

D. Sculley — Hidden Technical Debt in Machine Learning Systems.
Stanford MLSys Seminar Episode 5: Chip Huyen
ML System Design — Stanford — Lecture 2