MLflow in production at HelpShift
Deepak Ahire¹ and Shyam Shinde²
1. Software Engineer, Helpshift AI, Pune, India.
Email: ahiredeepak20@gmail.com. ORCID: 0000-0002-9174-0797
2. Engineering Manager, Helpshift AI, Pune, India.
Introduction
There are a lot of blogs out there that talk about how to get onboarded with or integrate MLflow in your machine learning pipelines. But this blog aims to share our learning experience on introducing MLflow as part of our production machine learning pipeline. We kicked off with onboarding HelpShift’s flagship feature — “Smart Intents” [1–5], on the MLflow.
In this blog, we have discussed —
- Why we chose MLflow?
- The challenges we faced in maintaining our legacy AI model versioning code.
- The challenges we faced during the migration of our existing ML models.
- The surprise that we got on the release day.
- Impact of using MLflow and our plans ahead.
An apt and concise statement — “Machine learning is 10% machine learning and 90% engineering,” talks Prof. Huyen in her Stanford CS329S course [6]. Perfect models cannot be built overnight. It is an iterative process! And hence, you require tools that help you manage the machine learning lifecycles of your models. This is our story, hope you enjoy it….
Why MLflow?
Our core requirement was to solve the problem of versioning. Before integrating MLflow, we had our in-house model versioning system which was tightly coupled with AWS S3. The S3 bucket path to store/access our models were tied to the business entities and eventually became feature specific upon implementing new features.
As part of our POC, our first preference was to select an open-source platform(s)/tool(s) that best fits our use case. Among them, the closest were DVC and MLflow. DVC was more aligned towards a command line tool with no UI. It optimizes the model and metadata storage by storing the diffs, just like GIT. We found that it is best suited for storing the models that are large enough in size of around >= 5GB. But we have lightweight and small-size models here at Helpshift.
In addition to these, we also had the following requirements —
- Experiment tracking functionalities.
- Custom tagging at various stages of models’ lifecycle.
- Instant and an easy access to all versions of the models, datasets, and the metadata via REST API or an interactive UI.
- If possible, everything at a single place.
That is where MLflow came into play and we started with the POC.
POC
For the POC, we started with a simple project and tested if all our requirements are satisfied, and we realised that the MLflow took care of everything by providing lightweight and reliable APIs. Then we started a POC on our production ML pipeline. During the POC, following are the things we learned:
- Listing down the expectations from engineers and the data scientists in your team.
- POC will give you a fair idea about scoping the estimates of the time you require to complete the project.
- Since you finalised the MLflow tool, this is the correct time to reach out to your DevOps team to set up the MLflow infrastructure.
Approval on POC from our team members made our doors open for the final integration.
Final integration
There was a challenge to integrate MLflow as it follows its incremental model versioning system, and you cannot specify a custom version number while registering a new model. Because our model versioning and training codebase was very well designed and divided into different independent modules initially, the MLflow integration was not much of a pain, but testing all our existing execution flows was challenging stuff. Here are some key takeaways:
- Before starting the integration, make sure to have a good practice you are aware of all the execution flows. Documenting of them or writing end-to-end functional tests will save a lot of time.
- While testing on your local machines, make sure that you test the same changes on sandboxes that have the same environment/platform as the production, as we noted some differences in behaviours across environment/platform.
After integration, here comes the humungous task of migrating our models..
Migration of production AI models
(Note — You can skip this section if starting new and you do not have any existing production models)
Earlier we stored (or versioned) our models in AWS S3 bucket. That is why there was a need to migrate all these models and the respective metadata from S3 to MLflow. When migrating the data to a new platform, our topmost priority was be to preserve the entire data. From a data scientist’s perspective, the entire history of a model is crucial so that they can analyze/monitor the trends of metrics and hyperparameters, and provide feedback to our customers/agents (who provide the training data) to seek more quality data.
To make sure that the migration runs correctly and to have a fair idea of how your production data format looks like, 1–2 dry runs of the migrator would be very useful. This would also give you a fair idea of how much time will be required for the final run of your migration and help you plan your release timelines and communications across the teams better. These dry runs will also be the ultimate test of your MLflow production infrastructure setup to check if everything is in place and how reliable the MLflow APIs are and how much load they can handle. Key takeaways:
- Software engineering practices restrict us to use production data for development and testing, and that is why sometimes we may not have a complete idea of what production data looks like because of continuous evolvement of a feature. For example, some of the files of particular type may be missing for the earlier versions of a model. In such cases, it is beneficial to study the structure/components of the data right at the start.
- To fetch & log the performance metrics/hyperparameters while carrying out the migration, avoid retraining. Try to regenerate the missing metrics only if they are important, and copy them from the existing sources.
- You can retrain a few models to check if you get the same results as earlier to ensure the reproducibility.
In the next section, we will have a look at how we serve the customer requests.
Inference
We download the latest trained models on our inference nodes by sending requests to the MLflow tracking server. As the name suggests, our inference service is deployed on these nodes.
Before downloading the models using the MLflow’s download API, we need to search only those models which are supposed to be downloaded based on some criteria. To search the models, we used the MLflow’s search API — search_registered_models(). The API returns a paginated list of the models we want are interested in and this list is then passed to the download API.
The inference service processes the end user queries using the downloaded models and return back the response.
Surprise on the release day!
No matter how much you take care of, production will surprise you! Whenever you are working with newly growing open-source libraries/packages, make sure that you also have the practice to read longer documentation, retain what you have learned, and understand the library’s codebase. As discussed earlier, for searching the models we used the search_registered_models() function. “max_results” is one of the parameters of this function. Now the issue is this parameter has a default value set to 100 and is an optional parameter as you can see in the following image —
The issue was that we missed using this parameter in our code for some reason. And on the release day, after the canary release, we observed that only a few models were getting downloaded (i.e., first 100 models). To get the list of all the required models (> 100 in total) we implemented the following loop —
# List of required models to be downloaded
list_of_all_models = {}
# Number of models we want in a single call to MLFlow
# Tune this parameter as per your use case
number_of_models_in_one_batch = 100
list_of_all_models = mlflow_client.search_registered_models(f"tags.model_type='smart_intent_model'",
max_results=number_of_models_in_one_batch)
last_token = list_of_all_models.token
while last_token is not None:
next_batch_of_models = mlflow_client.search_registered_models(f"tags.model_type='smart_intent_model'",
max_results=number_of_models_in_one_batch,
page_token=last_token)
for new_model in next_batch_of_models:
list_of_all_models.append(new_model)
last_token = next_batch_of_models.token
This list_of_all_models was then passed to the download API.
Impact
- Complete transparency of all versions of the models, respective parameters, performance metrics, and the datasets of customer-centric models within a few clicks, completely on the browser.
- Streamlines the model versioning process across different ML models required for the different business use cases.
- Eliminates the need to build and maintain our custom model and metadata versioning system.
- It allows us to track all the execution flows of the services (model building pipeline) that are responsible for creating the models using the custom tagging functionality. If anything goes wrong in the production environment, the devs instantly and exactly know where the issue is.
- Now we also track (log) the runtimes of different stages of the model building. This creates instant visibility and opportunity to optimise a particular stage(s).
- Data scientists can easily navigate to-and-from different versions of the models and the datasets used to build these models. This saves much of their time and effort and gives them power for data analysis and comparison.
- Data scientists can anytime watch what is going around in the production environment, how frequently our customers use the feature and how effectively they use it.
- MLflow eliminates the need to access the data directly via production AWS buckets acting as an unified data access layer.
- The model level tags allow us to directly navigate to the models based on their domains and names, and allow us to view their entire history.
- The experiments dashboard is in itself a model observability tool that is instantly accessible without doing any data science.
Plans ahead
MLflow provides functionality to configure the stage of a version of the model. These are — “None”, “Staging”, “Production”, and “Archived”. We mainly used — “Staging” and “Production”. The stage of any version of any model is editable using a single click or using a REST API call.
Our upcoming plan is to adopt this functionality that will enable us to choose any particular version we want as the “Production” stage model based on various factors, for example, model performance.
A glimpse of our dashboard
References
[1]Providing an intent suggestion to a user in a text-based conversation
https://patents.google.com/patent/CA3164413A1/
[2] Smart Intents concepts: Overview
https://support.helpshift.com/kb/article/smart-intents-concepts-overview/
[3] Intent in Chatbot
https://www.helpshift.com/glossary/intent-in-chatbot/
[4] What is Smart Intents?
https://support.helpshift.com/kb/article/helpshift-smart-intents/
[5] Why Intent Is the Future of In-App Customer Support Automation
https://www.helpshift.com/why-intent-is-the-future-of-in-app-customer-support-automation/
[6] CS 329S: Machine Learning Systems Design
https://stanford-cs329s.github.io/
[7] Helpshift Aims AI at Customer Service
https://www.pcmag.com/news/helpshift-aims-ai-at-customer-service
[8] MLflow Python API documentation
https://www.mlflow.org/docs/latest/python_api/mlflow.client.html
Reviewers
1. Utkarsh Dighe, Software Engineer II, Helpshift AI, Pune, India.
Citation (Bibtex format)
@misc{ahire_shinde_2022,
title={MLflow in production at HelpShift},
url={https://medium.com/helpshift-engineering/you-will-start-writing-more-at-work-after-reading-this-post-be9aa95db78f},
journal={Medium}, publisher={helpshift-engineering},
author={Ahire, Deepak and Shinde, Shyam},
year={2022}, month={Nov}}