MLOps use case at OBI

Published in

Machine Learning Reply DACH

6 min readJun 9, 2022

How OBI is taking ML products to the next level with MLOps?

MLOps is a set of concepts paramount to building reliable and efficient machine learning products. In our previous article, we detailed each of those principles and how it’s applied to each of the three main pillars of any ML product: data, model, and code.

In this third article in our series, we’ll go through a use case we, Machine Learning Reply, developed in collaboration with the data science team of one of our Partners, OBI.

Introduction: The ML product at OBI

OBI is a German multinational home improvement products retail company. It operates 668 stores in Europe, of which 351 are in Germany. In addition to the physical stores and to be closer to its customers, OBI built an app called heyOBI. Among the many features that the app provides is the possibility for customers to get in touch with a fleet of customer service agents and specialist product consultants to answer their questions. Since OBI offers a wide range of products, the questions can be literally about anything. To best advise customers, it’s important that each question is assigned to the right customer service agent or product specialist.

Initially, each incoming customer question or inquiry is read by a customer service agent and then classified into certain categories to then determine which agent or specialist to assign it to. With the huge number of OBI customers, this task, important as it might be, became tedious, time-consuming, and can inflate the response time which would reduce the quality of the customer experience. To address this issue, we, a team from machine learning Reply, collaborated with the OBI Data Science team to build a machine learning system that would classify each inquiry into the right category. The aim was to develop an infrastructure around the Machine Learning models built by the OBI Data Science Team.

As shown in figure 1, the inquiry matching ML product is decoupled from the heyOBI app and is built as an external product.

The inquiries and classes are communicated through dedicated Kafka topics with a certain timeout, after which the app will fall back on the status quo function of manual classification.We set a first-line fail-safe so that even if the ML system is out of service for any reason, the rest of the system will continue to operate.

The ML product architecture is shown in Figure 2.

In a nutshell, inquiries are pushed to Kafka in real-time, each inquiry flows through this pipeline and then the result class is sent back to Kafka. The number of the different components in the architecture alone should give you an idea of the complexity of managing it.

In what follows, we will go through the application of each of the 6 MLOPS principles in our use case.

Versioning

Starting with versioning. Since the ML product consists of 3 pillars which are data, model, and code. Versioning is pillar specific and done for each pillar.

Model versioning was managed using a model registry and an experiment tracker, both are provided by MLflow. This also includes a model registry, A model registry is a database for models, where you can store your models and have them versioned and documented, it also has an experiment tracking functionality which can be used to track all the parameters of each experiment that resulted in each model. MLflow is not unique in that, there are many other platforms that offer similar services such as Amazon SageMaker. MLflow was available and already used by OBI beforehand. Therefore, we were able to use the capabilities of MLflow to cover many principles without reinventing the wheel.

For code versioning, OBI is already mature in the way code versioning is implemented. The tool used by OBI is Gitlab. Since we have at least 8 different code repositories for lambda functions alone, establishing solid version control practices from the start is key to keeping this up. For every new change in any component, whether that’s a bug fix or a different feature of the application, we have used a new application version. For the handling of dependency and package management. The tool Poetry was used.

Data versioning is handled in the data preparation pipeline.

Testing

When it comes to testing, the typical model acceptance accuracy metrics were used during training and validation. Additionally, functional model input-output testing was used to ensure that the inputs and outputs of the model are exactly as expected. In addition to that, we implemented unit and integration testing for the end-to-end pipeline. Data testing is managed in the data preparation pipeline.

Automation

In MLOps automation means automating the way the model is provided for serving. To ensure automation we built a model training pipeline that automates the training part.

As for code automation, CI/CD pipelines are crucial here, every code change triggers a pipeline that automatically applies the required changes.

For data automation, we have the data preparation pipeline. Additionally, we have a data extraction and a metrics generation pipeline that run periodically to automatically make all the data we need available for us. These data can be the live input or the ground truth data, these can be used for monitoring, retraining, analysis, and many other use cases that might arise in the future.

Reproducibility

For model reproducibility, experiment tracking is leveraged, MLflow offers this functionality as well, which basically makes it possible to reuse the same models and reproduce experiments, to ensure reproducibility, all the models are versioned and tagged in meaningful ways.

For code reproducibility, version control is used, and every single piece of the infrastructure is programmatically created using infrastructure as a code technology, in this case, terraform containers and virtual environments are used all around.

For data, in addition to the data preparation pipeline, there are data reproducibility scripts that basically make production data available in other environments.

Deployment

For model deployment MLflow is used, it offers a nice serving functionality for model deployment. This is done easily since the models are already registered in the MLflow model registry. After it has passed the tests, it is automatically tagged for production, it can easily be served.

For code, CI/CD is the star of the show here, the aim is the have everything deployed through CI/CD pipelines, which are triggered through every code change pushed to any of the repositories.

Monitoring

To monitor the entire system, a set of metrics was put in place to inform us about model health; this tells us how accurate the models are and how performant they are.

For the code, full monitoring for the entire system was set up, resource consumption, runtimes, errors, invocations, etc.

The same was done for the data, we have statistics regarding the health of the incoming data.

All these metrics are visualized in one dashboard in CloudWatch. Which you can see in figure 3 and figure 4.

Conclusion

To conclude, the most important thing is to cover the principles. The tools that are used to accomplish these are not as important.

The tools do not necessarily have to be state-of-the-art, but they should work well enough to cover the principles in the best possible way.

It is virtually impossible to get every principle to perfection, simply because the world is moving, so prioritization and improving as you go is key here, the concept is there, try not to over-engineer it!

The main takeaway is to adopt the best practices. Tools are just tools to help you put that in place, but the focus should be more on adapting the principle, even if it’s simple in the beginning.

This product is currently running in production and is functioning exactly as it’s expected to, with every piece of the system being closely monitored. Business-related improvements are already planned. Thanks to the solid MLOps principles that we adopted by design, the product is and will be able to support developments with no downtime or restructuring.

The collaboration of the OBI Data Science Team and ML Reply teams is an exemplary showcase of successful ML product development.