Manage Machine Learning Lifecycle with ML Aide

Published in

ML Aide

3 min readMay 9, 2021

Key Challenges of Machine Learning Applications

Developing machine learning (ML) applications differs from developing traditional software applications. Machine learning development is experiment-driven and besides code versioning, you also have to keep track of datasets, parameters, and their impact on the resulting machine learning models. While training models, you use several datasets that change over time. You also try out different algorithms and parameters to develop the best model for your business problem. Usually, this is done experiment-driven. You will try some approaches. Some of them will fail and some of them will be promising — hopefully. In the next steps, you will do further investigation and tests of the promising experiments until the models are reliable enough to be deployed to production.

Almost every data scientist already faced the problem that a few months later the model has to be retrained or improved but it is not clear which parameters were used and how the existing model performed. It is even harder if a co-worker trained the existing model and you have to carry on the work. Therefore, you should track what went well and what did not. Otherwise, it is impossible to reproduce your work weeks or months later.

But how to keep track of your experiments? Code is tracked with tools like git. But git is not made for tracking binary artifacts. Here comes a new tool called ML Aide that makes experiment tracking and model management a joy.

With ML Aide you can manage the whole ML lifecycle. From data preparation to model deployment and monitoring. In detail this includes the following steps:

Log used dataset

Usually, one of the first steps should be to log the used dataset. This is the basic requirement to reproduce your experiments. Of course, if your dataset changes (or if you don’t know which dataset was used) your outcome of the experiment will differ.

Record data transformation

Often, the input data must be prepared, transformed, and cleaned for the model training. The transformation sometimes also includes scaling values. All these data transformations can be recorded with ML Aide.

Log parameters and metrics

Training models include a lot of parameters and hyper-parameters. This may start from parameters for the train-test-split and end up in detailed hyper-parameters to fine-tune the model. Also, metrics of your model performance are necessary to enable model comparison. All parameters and metrics are attached to the training and can be investigated and compared later.

Store created model

To use the model later in a serving application (e.g. a REST service) the model can be stored directly in ML Aide. All models will be versioned automatically and it is also possible to keep track of the current stage of the model. This includes None, Staging, Production, Deprecated, and Abandoned. This enabled queries like “give me the latest model that is production-ready” or “give me the model in version 3”.

ML Aide components

ML Aide comes with a browser-based user interface and a Python library. Everything that is recorded in ML Aide will be stored in the ML Aide webserver. The Python library integrates with common Machine Learning libraries. The user interface enables you to investigate your experiments and models.

ML Aide is made for enterprises. Identity, security, and integrity are first-class citizens. The source code is open source for maximum transparency and can be operated on every cloud platform or even on-premises. ML Aide runs on top of Kubernetes and therefore scales from single user to large enterprises.

Conclusion

All together ML Aide manages your machine learning lifecycle and accelerates MLOps.

In the next blog post, we will discover the ML Aide Python library and the browser-based user interface. Meanwhile, check out the ML Aide documentation.