From ML model to ML product

Alexander Mokryak
Exness Tech Blog
Published in
5 min readDec 23, 2022

My name is Alexander, and I’m a Machine Learning Team Lead at Exness. In October, I participated in the Linq Conference in Cyprus and talked about creating ML products from ML models.

Many people new to the field of machine learning are taught how to build efficient algorithms and state-of-the-art neural network architectures, but not many materials exist on how to use those models in real life and turn them into products. I hope to fill that gap with my talk, which will be relevant to both ML practitioners and managers working with machine learning.

It’s common for a data scientist to create a model and present it to stakeholders with impressive charts and metrics, but this is only the first step in turning it into a working product that generates value for your business. The model is a piece of code that uses data to make predictions, while the product is a service that integrates those predictions into a business process and has established quality standards and service-level agreements.

One important consideration is converting machine learning metrics into business metrics that stakeholders can understand, such as revenue.

It’s also crucial to properly validate the model, including sharing the data set and validation code with the team. Fixing the metric, the time period for validation, and the user group is important. This will help ensure that the model is robust and can be relied upon in production.

When it comes to technical requirements, there are a few things to consider. First, think about whether you need real-time or batch processing. Real-time processing is when the model is updated and makes predictions in a near-instantaneous fashion, while batch processing is when the model is updated and makes predictions at regular intervals. The choice between the two will depend on the specific needs of your product.

The ideal situation is when we can establish a service level agreement (SLA). An SLA is a formal agreement with other parties involved in developing your service that outlines the terms and conditions of the agreement. It’s not enough to just have metrics and technical requirements; it’s important to also agree on them with stakeholders. Having an SLA in place is even better, as it provides a clear understanding of expectations and responsibilities.

Data is another important factor to consider. Imagine a situation where a data scientist trains a model using data that is not available in the production environment. This can prevent the model from being implemented. Poor data quality can also be a problem, as it may result in issues such as missing values, errors in values, or unexpected values. It’s important to be wary of data quality and continuously monitor it to ensure that it meets your needs.

Additionally, it’s important to ensure that data is processed in the same way during training as it is in production. Establishing a clear pipeline can be helpful in this regard, and we’ll discuss this in more detail later.

Data drift is another factor to consider. Data drift refers to changes in the distribution of the data over time. If the data used to train the model changes significantly, the model’s performance may degrade. Therefore, it’s important to monitor data drift and retrain the model as needed to ensure that it continues to perform well.

Continuous integration/continuous delivery (CI/CD) is another important technical consideration. CI/CD refers to the process of automatically building, testing, and deploying the product. Automating this process can save time and reduce the risk of errors, but it also requires proper configuration and testing.

The “training pipeline first” approach means that your training pipeline is integrated into your final production system.

Using this approach makes it much easier to train the model and implement new features. All you have to do is add the feature, retrain your model, experiment with different parameters, update the model version in the model registry, and use the new version of the model in your prediction pipeline.

The final consideration is monitoring. For example, we need to keep an eye out for data drift, but how can we know if we have it if we don’t monitor data quality?

Think about which metrics you will include on your dashboard and how you will monitor them. It’s likely that you will want to include the business and ML metrics you identified at the beginning.

It’s also important to have a backup plan in place. Consider how you will be informed of problems and what your action plan will be. This may involve setting up an alerting system, determining where alerts will be sent, what actions should be taken in response to an alert, and who will be responsible for fixing any issues that arise.

Here’s a checklist of steps to follow to ensure that your ML model is ready for implementation:

  1. Consider metrics. What business task are you trying to solve, and how will it impact the business?
  2. Think about data. Don’t blindly trust it; monitor it, check for errors, and handle it properly.
  3. Determine how you will deploy your model. Set up a CI/CD process and treat your training pipeline as part of your production system.
  4. Monitor the health of your system. It’s crucial to monitor it and have a plan in place for addressing any problems that arise.

Additionally, you can download the checklist.

I hope you found these tips helpful in your journey to creating an ML product. Thanks for reading!

--

--

Alexander Mokryak
Exness Tech Blog

Write about data science, machine learning, endurance sports and self-development