How to Bring ML to Production: Tips and Useful Tools

Published in

Georgian Impact Blog

12 min readSep 16, 2022

At Georgian, our R&D department often works with our portfolio companies — who we refer to as our customers — to help them multiply ROI. One important way we add value to their products is through machine learning.

On the R&D team, my main task is to bring machine learning models to production. Using my experience doing that and teaching a course, ML in Production, at Project, I’ll go over the lifecycle of ML models and the infrastructure for machine learning models.

This article will be helpful if you are:

A software engineer who wants to change or expand their work direction or move to the ML domain.
A data scientist or junior/mid-level ML engineer who wants to learn how to implement and expand ML models in production and become a full-stack data scientist.

ML Model Lifecycle

Let’s start with the lifecycle of a machine learning model: what it is and why we need it. There are many options for how to present it. For example, popular versions from Google and Neptune, or my favorite from Martin Fowler in his article Continuous Delivery for Machine Learning.

These are not the only options, but today we will focus on the simple ones.

In the remainder of this blog, we will go over this lifecycle stage by stage:

Data

Every ML project starts with data. This is probably the most important and most difficult stage of the lifecycle of ML models. If you don’t set it up first, your ML project will most likely fail very quickly.

This part of the lifecycle is likely going to take up most of the resources. Improving any part of the model will not have as much of an impact as improving the data. For instance, you can spend months experimenting with architectures and gain 2–3% of your target metrics. On the other hand, cleaning the data can give you a performance gain that is several times larger.

Therefore, this is the most important stage. You need to understand how to collect and manage data. Let’s start with the fact that they need to be stored somewhere, and for this, you need to set up storage. We can use different approaches:

Data lakes, or just save data in object storage. As an example, s3 or minio.
Data warehouse. It can be a simple database, relational or not.
Data Mesh. An increasingly popular approach. Here is a very good lecture about this concept. It deals with the situation where you separate infrastructure data into separate, decentralized products.

In addition, you will need to understand ETL processes and how you will add data to the system and work with it in the future. The two most popular tools we use are Spark and Kafka, or their commercial counterparts.

Of course, it all depends on the scale. In my experience, a company of 50–100 people processes ~ 20–50 TB per month, and usually this is done by a separate data engineering team.

I really like this visualization of knowledge and skills of a data engineer, where you can see which are the most relevant tools and choose useful ones for yourself. If you want to understand the theory fundamentally, one of the best books I’ve read on the subject is Designing Data-Intensive Applications.

Once you’ve mastered the storage and tools for working with data, they also need to be analyzed, labeled, and have implement version control systems for datasets.

The most popular markup tools we use:

For computer vision — CVAT and Scalabel.
For text markup — Prodigy.
For multiple formats — label-studio

Great selections of labeling tools:

As well as tools for managing versions of your datasets:

A use-case, which we also sometimes come across, is an addition to the Feature Store. It is an ML-oriented database for reusing ML features.

Some examples of Feature Store:

As you become a larger corporation, you need to effectively manage access and monitor data integrity. Sometimes your data and its access need to be certified, especially for fintech companies. I don’t know of any general framework used here. Typically, companies write their own extensions or toolsets based on their storage solution.

Experiments

Once you have the data, you can start testing hypotheses and move on to experiments. The speed of iteration is often important at this stage: how many hypotheses you can validate, and how many experiments you can run.

Although you will reject 99% of the code and results, owing to the trial-and-error nature of this stage, designing it in a scalable and reproducible flow would be a good idea.

Along with data management tools, we typically use templates for experiments. A very popular method is pytorch-lightning. This framework allows you to avoid working on low-level details: how experiments will run, how to use multiple GPUs, and so on. Pytorch-lightning may not suit everyone, and teams often write their own solutions or use templates, such as The Data Science Lifecycle Process framework.

One of the most important components of this stage is the system for tracking and controlling experiments. Of course, you can write the results in a spreadsheet (we also sometimes do), but if it’s something long-term, you’re not working alone or want to see more details — I advise you to use a specialized tool for this.

A few options include:

TensorBoard is very popular, and has many integrations with different frameworks. You can also use TensorBoard.dev to save your results in the cloud and not share a screenshot in slack with colleagues. However, it is often difficult to scale and over time it becomes difficult to work with.
Then you can switch to MLflow Tracking, a tool for managing the entire cycle of ML models. Often, it is used for experiment management.It’s an open-source tool you can maintain yourself or you can use a managed solution from Databricks.
The best option, in my opinion, is to start with a managed solution. This is because you don’t need to support and maintaining it. Often managed solutions provide a much better UX and feature set than TensorBoard and MLflow Tracking, but can be quite expensive. We mostly use Weights & Biases and Comet.ml.
There is a new open source solution for experiment management, which I like a lot too: AIM

You will also need to manage the infrastructure for model development, such as a cluster with a GPU. You can start by giving everyone access to creating machines in AWS EC2 or its analogs. For example, ssh will give access to your on-prem cluster. However, with the growth of the team and the number of experiments, this will become inconvenient. Different experiments may conflict over resources, or the team may forget to delete manually created machines and you will waste money and so on.

In this situation, you can add a centralized solution. This can reduce flexibility for the team, but the benefits far outweigh the effort and resources expended. As a result, it’s still worth considering a centralized launch of experiments. Most often, Slurm might be used for this purpose which can be combined with the Submit It! package. Alternatively, Kubernetes is also a popular choice. There is a good case from OpenAI where they work with Kubernetes to train huge models such as GPT3 and others.

Pipelines

So you’ve done 100,500+ experiments, validated tons of hypotheses, and you think the model can be useful. It’s time to re-structure your code from ad-hoc training into a clean pipeline, which you can train again easily in order to train in production and retrain (a quite common use case in production).

This step may not be relevant to your team and you will simply manually select the saved weights of your model. However, your delivery is often not just a model, but a whole pipeline to get it. This might be a requirement in several cases. For example: final training and producing the model should happen in an environment you don’t have access to.

If you use models in production it would be great to follow best engineering practices while writing the production code. But how do you do this? Typically, we follow these two steps: dockerization and the use of the pipeline library.

The most popular pipelines libraries we used:

In general, there are many libraries and each claims that it solves your problem. If you are interested in this topic, you can go further and find out which of the pipeline tools suits you best. I highly recommend reading this article on how to write ML pipelines and which framework to choose.

Deployment

Now we can move on to deployment to make your model available to someone. There are several options.

Option 1: Do not deploy the model at all :)

Jeff Atwood, co-founder of Stack Overflow: “The best code is no code at all.”

The same can be said about machine learning models. If the business is not yet sure whether it needs this model, or you are just in the discovery stage — you do not need to deploy the model and make it another component to support. Just wrap your model in something extremely simple and validate further business efforts.

We often use:

The fact that we can build and deliver some ML models does not mean that this is exactly what the business needs. Unfortunately, I have very often seen projects that move more out of engineering curiosity than solving a real business problem.

Option 2: Wrap in a Python framework

I think this option will be most understandable to developers who often work with microservices or other similar areas. Wrap the model in your own microservice; this is a valid and good option to start.

For example, when the team you work with is just starting to discover ML and wants to launch the first model in production, we usually choose this option because of its simplicity and clarity.

We mostly use FastAPI (from Explosion, spaCy creators) due to many built-in features.

Other popular options:

Flask — I think most people who learn web development in Python are familiar with it.
Aiohttp (from Python core contributors) is being developed in Ukraine, and I was once lucky enough to work with its author. It is very useful if your application needs asynchronicity. But be careful! Since ML models are usually CPU-bound, asynchronous Python frameworks, which are mostly based on asyncio libraries, this should be used with caution! You can read more about asyncio, and its pros and cons in this tutorial.

Option 3: Use an Inference server

This is one of the best options when ML deployment is very mature in your company. If you do not want to repeat the same pattern over and over again, but want to standardize the way you deploy models, you can usually use a ready-made solution.

The most popular:

This list is far from complete. We often use Seldon and KServe. The general idea is quite simple — convert your model to a format that the inference server can understand, and then it will be able to deploy it automatically. You don’t have to write your own web server, just save the model in a certain place in a certain format. Such frameworks give you a lot of additional features: a version control system for models, explanations for the results of ML models, and support for several models out of the box.

It is worth noting that none of these options excludes the other. It is important to always balance and choose exactly what you need to solve business problems. If you are a startup with 10 people, then there may be things that are more important now than the inference server, and you can work with FastAPI as a starter. But if you are working in a corporation, where the ML department itself has more than 100 people, it is pertinent to give everyone a tested and reliable way to deliver models. Seldon and KServe shine in these situations!

Monitoring

Once the model has been deployed, we don’t want it to degrade. To prevent this, you need to monitor the model. Traditional software monitoring strategies like logging and uptime are not enough. Oftentimes, the service works well, all SLA is held, and the model gives all the results. But if you analyze the results, they are completely wrong. Even worse is that this situation will not happen immediately, but six months after its release. This discrepancy in results can be due to many reasons.

It is possible that the statistics in the data have changed, or the distribution you worked with when developing the model is no longer valid. It could also be that the model no longer works as accurately as you expected because, for example, your product has released a mobile version, all user interaction patterns have changed, and you need to retrain the model.

I really like the visualization from this resource, which gives a rough idea of the problem. If you want to delve into the topic — a series of five blog posts here b1, b2, b3, b4, b5 will be a good entry point.

In short, you need to:

Monitor data quality. You can use simple tests to check whether the data scheme has changed, whether there are no Null values in the columns where they should not be, and so on. You can also use sophisticated drift detectors that can tell if you’ve had a dataset shift, how the distribution between production data and the data you’ve trained on has changed. Such detectors can be both simple statistical, and separate ML models.
Monitor the quality of the model. Usually, you need to save the results of your inference and get real labels for these results to build a performance model in real life.

To accomplish these tasks, here are links we use:

For simple tests
To detect the date of the drift and its good visualizations
A library that can find data drifts and more
As a platform for deployment. It can monitor your model automatically and give you endpoints so you can send labels to build a real performance of your model in production. This team from this platform published a very good report on this topic.

Based on the results of the model and data monitoring, you can decide to return to one of the previous development steps and improve it. If you want to delve into the topic of monitoring, I advise you to start with this blog post and this tutorial.

Bonus: platforms

Throughout the article, I gave examples of many tools. But the speed with which new solutions are emerging is truly astounding. I advise you to review the article on this topic. Unfortunately, there is no standard yet.

It is worth noting that there are many platforms for ML that claim to cover the end-to-end machine learning cycle and support each of the steps we discussed above. The most common of these are AWS SageMaker and GCP Vertex AI.

I have not yet found a universal solution. That is, there may not be a platform that would solve everything for your particular case. Therefore, you need to combine different solutions, both open source and commercial.

There is a good table for comparing different platforms and solutions.

Summary

To conclude, the development of ML models is a process that includes several components: from data processing and experimentation to monitoring and returning to one of the previous steps. This is usually a cyclical process, each of which requires a special approach. Transitions between components are very conditional, and moving from one component to another depends on the situation.

I hope that this blog gave you a high-level structure of the MLOps process, and entry points to help you dive deeper into this topic! Reach out if you have any questions or suggestions and stay tuned to hear more about the constantly-evolving field of MLOps!

This is an adaptation of the original article: https://dou.ua/forums/topic/36499/ from a Ukrainian tech journal.