The story of a Backend Engineer looking for the right way to deploy ML models to production

Published in

Melio’s R&D blog

7 min readSep 13, 2022

How would you handle a task that’s related to a topic you don’t consider your “cup of tea”?
What if that task changes another team’s workflow, and can potentially have a large impact on your company’s success?
If what comes to mind is “panic”, take a deep breath and let’s start to break this down.

First things first — receiving the task

During a casual meeting with my manager, he notified me that our Data Science team was working on developing several new ML models. Their ML model workflow was yet to be well structured, so they needed someone to help them improve their current workflow. It was a unique chance to learn more about the ML world — which was really interesting to me.

The main challenge with this task was that I didn’t really know anything about their current workflow.

Phase 1: Understanding the task

In order to improve our Data Science team’s workflow, I had to understand their current workflow. So, I contacted them for a brainstorming session. We sat together, and they started drawing their life-cycle of a Machine Learning model development. When we reached the deployment phase, they told me that they struggled especially with this part of the process.

I tried to define that missing piece of the puzzle in their process. It seemed to me that they were missing a developer. That developer would need to be in charge of building production systems for end-users to access their ML models. This would enable other teams in Melio to use the model-serving infrastructure to build applications, and user interfaces for surfacing predictions to model users.

We continued to define some must-have features for the MVP (Minimum Viable Product), discussing and considering the following:

A/B testing — being able to run different versions of the same ML model in parallel. Having the ability to control the traffic that goes to each version.

Shadow mode — running an ML model version without actually returning the response or prediction to consumers.

Low latency — getting an immediate response when requesting a prediction.

Monitoring — being able to track historical changes over time and monitor all of the above in a comfortable and informative way.

The way I see it, you should always strive to supply your customers with the best tools to satisfy their needs. Furthermore, you want them to be independent enough to be able to use these tools without your support becoming a bottleneck in their process. Having said that, you still want to be their Go-To person whenever they need your help.

Phase 2: Research

Many questions surfaced during the research process, and my goal was to answer them one by one.

To research effectively you need to look for great sources of information, like the web, colleagues, and friends from the programming community, who are facing the same problems.

All roads lead to production

Before you start deep diving and breaking the process down into sub-processes, you should try to figure out how other companies deploy their ML models to production.

After investigating, I concluded that there are many ways to do it. Some companies preferred to build in-house tools and some used third-party tools for their CI/CD processes.
Fortunately, I could find some guidelines on what an ML model life cycle should look like.

Starting point

In this specific scenario, I started from the CD and not from the CI. Compared to the CD process, having a decent CI already meant this was not the real pain point. We wanted to move fast, and find the right balance between quality and speed.

Store now, use later

The input from the Data Science team was a compressed ML model file. You should consider separating model types and their versions. By having this separation, you will be able to control which models and versions you want to deploy easily. Bear in mind to account for storage capabilities.

Every ML model needs a machine in order to thrive

The next step was to host those ML models and their versions on machines. You should choose each machine type carefully, in order to save money and have good ML model run times.
Involve the Data Science team in this decision. They probably know how many CPUs/GPUs and memory are needed for each ML model and version — don’t worry, these configurations can be adjusted during each deployment process.

The secret spice

I needed to figure out how to call the ML models, and also support A/B testing and “Shadow mode”. The most common technique to solve this problem is using a load balancer. The load balancer will be responsible for adjusting the traffic for each ML model version and supporting “Shadow mode” model calls.

Also, consider using Blue-Green deployment to reduce downtime and risk. It assures your safety, so you won’t ruin your entire production environment after a single deployment.

One endpoint to rule them all

Eventually, I wanted to have an endpoint that will call the requested ML model and return a prediction response, via the load balancer. You want to give a simple API to the ML model consumers. Therefore, each ML model will have a single endpoint that will act as the gateway for all the ML model consumers. Since the load balancer is in charge of adjusting the traffic between the ML model’s different versions, the consumers won’t be exposed to which specific version of ML model they just called.

The art of monitoring

After deploying an ML model to production, the next step is to monitor for problems. There are two kinds of monitoring that are crucial for the ML model life cycle:

Performance monitoring — knowing when machines are down or have stopped working, getting high CPU usage, not enough memory, high latency, etc.
Data monitoring — checking the ML model logs for “data drift”, getting insights to retrain your ML model for better accuracy, etc. This kind of monitoring is more for the Data Scientists, than the Backend Engineers.

Having an organized plan for how to tackle the task, should help the “panic” disappear. It also builds confidence and motivation for the implementation part.

Phase 3: Implementation

To begin implementation, it’s best to start with some POCs (Proof of Concept).

In my opinion, POCs are very important when it comes to choosing the right tools or algorithms to solve a problem. Eventually, POCs help you form a strong opinion about each step of the process and how to implement it.

To build or not to build

The question that came to mind was, “should we build an in-house tool or use third-party tools?”

To answer this question, you need to consider three main trade-offs:

Flexibility — By using a third-party tool, you are limited to making changes and adding your own features. Eventually, you are dependent on the tool’s implementation, treating it as a “black box”.

Time to deliver — After calculating the amount of work and available manpower, we came to the conclusion that implementing the deployment process in-house would probably take a few months. However, adding a third-party tool to the deployment process was only a matter of a few weeks. For a young company like us, we leaned (heavily) towards the faster option.

Cost — At the end of the day this is the criteria that has the most impact on the decision. I recommend calculating the cost for the short and long term, since using a third-party tool can save a lot of development time (and as we all know — time = money). It might even be the cheaper option, at least in the short term.

It’s a match!

The POCs also helped us choose a third-party tool that answers all of our current needs.

By using the tool’s UI, the Data Scientists can independently deploy their ML models and manage their versions. It also supports A/B testing and shadow mode, and has low latency and monitoring abilities: all of the features they listed as must-haves!

In the end, only time will tell whether we made the right choice. We all hope that after a year or so, the third-party tool will crack under pressure and turn back into a pumpkin. If that proves to be the case, we’ll need to repeat the research process.

But hopefully, the tool will serve us well and we’ll use it happily ever after…