Data Science: Model deployment

6 min readMar 18, 2022

In last chapter we discussed Model evaluation. In this chapter we will discuss how to deploy your machine learning model in production and things to consider fot the same. Your data science project will provide any business value/impact only if you have deployed it to production, users are using it and providing feedback.

Model packaging and distribution

Once all different models are evaluated and one model is accepted by business/end users as well as data scientist based on different evaluation criterias, next stpe is to package the model for deployment. We need to export finalized accepted model into a specific format such as pkl (serialized object), PMML, PFA, ONNX, or even simple english if-then-else rules which describes the model and will be consumed by the business application. For audit and compliance purpose, you need to maintain the versions of all datasets, models and corresponding metadata (such as metrics, parameters, plots).

Sometimes you will have to package your model and deploy it on multiple servers or make it available to users as a file or pip install. There are different ways to achieve this.

Releases feature on github — After merging your code changes to the master branch in Github, you can create a new release for the latest codebase and then deliver project iterations in upcoming releases along with release notes. Each release will have links to binary and .zip/tar.gz files available for download and use.
Python package — We can also create a python package using setup.py which we can make it available to users via pypi (pip install) by uploading it to the distribution archive using twine (if open source) or using organizational internal artifactory like JFrog. Once we package our Python codebase, a .whl file will be created. This .whl file can be directly made available to users and installed via pip install.
Containers — You can containarize your ML application code using Docker and Kubernetes and distribute the same.

Model deployment patterns

Following are the widely used machine learning model deployment patterns and strategies to release your model in Production successfully:

Shadow mode — In this deployment pattern, ML system shadows either the human or already live model and runs in parallel. But this new model’s predictions are not actively used for decision making yet. ML system will be made live only after evaluation and user acceptance.

A/B testing — In A/B testing, we deploy two different models say A & B in a contol environment (different scope or set of users). Then we evaluate which one performs better in terms of business KPIs and user acceptance and deploy the winning model.

Canary deployment — In canary deployment strategy, we start by routing small percentage of of traffic (say 5%) to the model for prediction and decision making. We validate the predictions made by model and increase the traffic gradually if it is performaing as per expectations.

Blue/Green deployment — In blue green strategy, we have two systems running in parallel in production. Blue environment is existing system running in production while green environment is the system having new features. We test the Prod data coming to Blue system in green system as well and observe the performance of the new model. If performing as expected we make green environment as the live system by diverting traffic from blue to green i.e. green system becomes blue sytem so that there is no downtime and rollback is easy.

Model serving

Once model is packaged you have to serve your model in production enviornment so that end users can explore, use and give feedback for it. Serving can be done either realtime or batch mode and in various ways :

Exposing model via REST API so that any application can consume it by passing the required parameters. You can use Flask framework for exposing your model stored as .pkl file via API, gunicorn as application server, apache mode_wsgi or nginx as web server (if static content is to be served), connexion for OpenAPI/swagger specification, FlaskMonitor for monitoring your Flask API application calls and Supervisor for process management on your server.
EdgeML i.e. model embedded or deployed on mobile or edge devices or browser
deployed directly in application or database code or batch data pipeline

BentoML — BentoML is an open platform that simplifies ML model deployment and enables you to serve your models at production scale in minutes

SeldonCore — Seldon core converts your ML models into production REST/GRPC microservices

Model usage and interaction

Once model is deployed, end users can use and interact with the model. This interaction can be via user interface on mobile app or website that users are using already or newly created by a frontend developer for this particular use case. As a data scientist you also can create your own simple UI applications in Python. For creating lightweight data and ML applications you can use following open source packages:

Streamlit — To build and share data apps in python
Gradio — To create ML apps
Plotly Dash — Analytical UI apps for Python, Jupyter, R, Julia

Model monitoring

Once model is deployed in production or real world, data scientist’s job is not finished. Data scientist has to monitor the model on different aspect such as model accuracy trend, data quality, user experience, etc. Many a times model performance/accuracy degrades in production in few weeks/months because of drift.

To avoid model decay and meet user/stakeholder expectaions, you should monitor data, model(predictions) as well as system across the ML pipeline on following things:

Data quality — Quality of real world data that is fed into the model is important that impacts the model performance.

System/Infra/SLA monitoring — Monitor if the expected SLAs in terms of latency, number of model API hits/unit time, batch size, throughput, uptime, disk usage, out-of-memory errors, etc.

Model quality /performance— To monitor the quality of model (i.e. to know if model is working as expected in real world or the performance is decaying or improving) we have to monitor performance of the model over a period of time using different evaluation metrics based on problem type. Read about model evaluation here. Based on the model quality, we will have to retune the model by performing the root cause analysis and take corrective actions such as adding more/recent training data and/or feature engineering and/or new algorithm and/or hyperparameter optimization, etc.

Data drift

Predictors drift — Over a period of time there might be change in the distribution of the features/predictors P(X) causing decay in the model performance. For numerical features, we can use the two-sample Kolmogorov-Smirnov test and for categorical features, we can use the chi-squared test. If there is huge change in distibution of input features between training/historical dataset and latest real world dataset, we will have to retrain the model. The model retrain interval and threshold of change in distributions will differ based on uase case and data.

Target drift — Target drift indicates change in the label/target/output variable P(Y)between training dataset on which active model is trained vs. prediction/real world data using the statistical tests such as two-sample Kolmogorov-Smirnov test and chi-squared test.

e.g. Change in spamming behavior, selection bias, rule update, etc.

To overcome data drift, add more latest data to retrain a model

Concept drift

Concept drift indicates change in the relationship between Xs/predictors and y/target variable P(Y/X). We can calculate the Pearson correlation between the target and each individual predictor in these two datasets to detect a change in the relationship. Pt(X,y) != Pt+1(X,y)

e.g. e-commerce app, sensors data, movie recommendations, demand frorecasting.

To overcome concept drift, relabel the old data and retrain the model.

Evidentlyai is a open source package available for model monitoring.

Versioning and Logging

Maintaining versions and logs of all the historical and active(accepted/prd) models as well as corresponding datasets and metadata (such as metrics, parameters, plots or artifacts) is very important. This is required for the data lineage, model regeneration, compliance and audit requirements.

Data Version Control or DVC is an open source package that can be used for maintaining the versions of models and datasets.

Loguru and whylogs will help you with the logging and debugging of issues in code as well as data.

Feedback

Once the model is deployed in production, end users will start using/exploring it and provide feedback. Integrating this feedback iteratively is very important to give best user experience. Data scientist may have to retune the model if model feedback is not upto the mark. If issue is with the UI and workflow experience, you will have to work with front end developer to fix and enrich it.