ML Ops: Real World Model Deployments

Understanding tenants of deploying ML products to production.

Vimarsh Karbhari
Acing AI
5 min readApr 17, 2020

--

Deploying machine learning/AI/data science products is not an easy task. Most of the time, outputs of machine-learning algorithms are compiled as artifacts that need to be incorporated into existing production workflows or services. Sometimes, the languages and techniques used to develop these models are different than those used in building the actual service.

In this article, let us focus on the model deployment part of the data science pipeline. As the data science team and process matures the team deals with having complex scenarios, so having a simple model deployment mechanism does not scale. In this article, we will cover the approach of deploying data science models from a depth wise operational perspective.

Data science models as a deployment entity

Like code, models will be deployed in the real world. Hence, we have refrained from calling this DevOps (developer operations) and called it ML Ops (Machine Learning Operations). They are being used interchangeably today.

A lot of data science terminologies treat models as the artifacts of a machine learning approach but that limits its nature to being too restrictive. This arises largely due to data scientists working in their silos and just sharing their final models as an artifact fit for deployment. However, if you delve into the workflow of a data scientist, while building a model, they often start simple.

What Big-O is to coding, validation and evaluation is to data science models.

When first applying machine learning to a problem, your biggest improvements will usually come from implementing any solution, not necessary the optimal one(Validation and evaluation). This is similar in terms of code when we first try the brute force approach and then optimize based on performance(Big-O). In this context, I consider the model to be the combination of hyperparameters and algorithmic approach, but not the parameters (which are dependent on data, discussed below). We may want to gradually improve the accuracy or the area of application for the model. Hence, the model is ever evolving like the code and should be treated that way.

Scenarios of model deployment

Different modeling approaches have different properties and limitations. A regression model is consumable, but a neural network might be more complicated to understand. Therefore, when we consider what downstream requirements our system affects, our choice of model may have impacts that are independent of the data used to train them.

The choice of model (including hyperparameters) is never a singular decision.

It leads to different scenarios of model deployment.

Photo by Luke Chesser on Unsplash

A/B Test Models

As a model is the combination of hyperparameters and algorithmic approach, the team may want to try to check the efficacy of different solutions as time evolves, or at the same time. This could be done to actually understand which model performs better in production. The added complexity here comes from the infrastructure and routing rules required to ensure the traffic is being redirected to the right models, and that you need to gather enough data to make statistically significant decisions, which can take some time. This approach is called multi-armed bandits experiment. For AWS, this is explained as a real time multivariate optimization. For Azure, this can be achieved using Azure Personalizer.

Canary Models

Netflix uses Isthamus to introduce resiliency on ELB outages. The same architecture is used to divert traffic on new canary builds and releases within Netflix. The concept is instead of completely phasing out certain part of the product, they have a new part of the product deployed and slowly divert traffic to it. The practice of canary releasing has been adopted for some time. Sometimes it is referred to as a phased rollout or an incremental rollout. In the ML world, there would be an existing model and a new updated model. We would want to slowly phase out the old model to the new one instead of switching to the new model right away. This is done to gradually gather data and metrics about the new model before completely switching to it. We could use the same technologies described in the previous method however the reasoning and the approach changes.

Real time learning Models

There are scenarios in which models use algorithms and techniques that can continuously improve its performance with the arrival of new data in production. They are continuously learning in production. The previous two scenarios can be applied to all kinds of models as they are ways to improve the product and delivery of the model itself. This scenario however, the nature of the model itself is different from an offline learning perspective to a newer online learning perspective. Examples of these models would be the models deployed for social experiments (Each ‘like’ on Instagram can dictate a different outcome of posts served to you) or graph based scenarios where models are still not able to view the entire graph and make decisions as new areas become visible. Here, it is imperative that we treat these models as code for the same reason we treat infrastructure as code. Additionally, we will need to version not only the training data, but also the production data that will impact the model’s performance.

Composite model (Collection of models)

Sometimes there are multiple models which power a product/feature. Imagine that there is a product which assigns a credit score or a rating. In this product, there might be an API call to get a particular score which could be delivered as an event to the user/customer profile. Behind the scenes, multiple models might be used to get that cumulative score. Each model can be experimented using different approaches above in a test environment or a production environment. The more we have separation of concerns with the application and different models the more it could increase the flexibility and ability to scale.

Recommendations

To support more complex deployment scenarios the process and infrastructure needs to be flexible. As your product/app scales, more infrastructure might be required to serve additional users and customers. Hence, if different model deployment scenarios are supported it will directly impact scalability and reliability of your system. Neptune.ai explains about ML models in production using tfx and Kubeflow. It is one of the ways in which all the above deployment scenarios can be put into practice. A machine learning product is an amalgamation of model, data and code. The more we support different scenarios where each of these can change over time(cause they will) the better chances we have to make the ML product successful.

Subscribe to our Acing Data Science newsletter for more such content.

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.

--

--