The MLOps Handbook: The Why

Published in

AirAsia MOVE Tech Blog

5 min readSep 18, 2022

The Internet is full of theoretical concepts of what MLOps is supposed to be. We at AirAsia SuperApp undertook a journey to implement MLOps more than a year ago and encountered a wide variety of challenges that no one warned us about. This 4000-word three-part handbook is a handy compilation of our learnings from the journey.

Why is MLOps so important?

a. Machine Learning is heavily experiment-oriented

While organizations have leveraged traditional Business Intelligence (BI) and analytics for many years, Machine Learning (ML) represents a new frontier in a competitive landscape for optimizing workflow and customer experiences.

ML solutions are pursued to solve real-world problems that are complex and multi-faceted. Experimentation has been a hallmark of the exploration undertaken by data scientists and engineers in tackling these complex problems.

As the nascent community evolved their understanding of ML’s capabilities and requirements; so has the governance and lifecycle of ML solutions. As with any evolution, earlier generations may not be able to withstand current real-world rigors.

MLOps are practices to provide structure to experimentation and adaptability to handle not just the problem presented today, but also for the future.

b. Increasing the productivity of data science team

In our experience, a data scientist can only manage 4–6 ML pipelines with active monitoring and re-training.

Perpetually hiring new data scientists is not a sustainable solution to increasing overall volume and throughput for your Data Science team. MLOps introduces practices that promote efficiency, reusability, and robustness that drive increased delivery velocity per data scientist and between them.

The three key principles that drive velocity are:

Event-Driven Automation;
Modularization; and
Testing.

We will expand upon these further when diving into the fundamentals of MLOps.

Ultimately, MLOps-conformant pipelines could see data scientists increasing their pipeline management capacity by a factor of 3 to 5 times.

c. Best practices for better collaboration and contingencies

It’s commonly acknowledged that implementing best practices will lead to robust and reproducible solutions. Equally important in the context of teams is that standards and design patterns facilitate better collaboration between their members. Larger ML projects that have multiple data scientists working in tandem have a reduced likelihood of conflicting code by eliminating member-specific idiosyncrasies.

The commonality of practices allows pipelines to be transferred between team members in the case of handover or attrition. Additionally, it can streamline the onboarding process for new data scientists as they can adopt a unified vision of solution development.

d. Creating value using ML is hard

Photo [Cropped] by JESHOOTS.COM on Unsplash

It has been a common approach to allow the data to speak for itself. This leads to black-box solutions, where the number of variables and their intertwining relationships is so complex that they are not humanly interpretable. This makes them inherently more difficult to debug when they start producing the wrong results.

Decomposing or constraining machine learning by hypothesis is desirable to produce more explainable and interpretable models. However, there is still considerable technical debt in developing solutions. To mitigate the cost of business investment in a potentially flawed hypothesis, typically ML is tested as a proof of concept.

Frequently, this also involves stripping back the practices that reduce the cost of the pipeline over time but are relatively expensive to implement if the experiment is abandoned. However, even if the experiment is successful, this has the business consequence of building in the production readiness at an additional cost or leaving the solution brittle.

Real-world performance is not a guaranteed outcome. The performance of a model can vary widely as a result of decisions made by a data scientist. Considering how the model performance will be evaluated, it is possible to measure numerous metrics, but it isn’t possible to optimize for all of those metrics. A data scientist might typically optimize for a particular offline metric and monitor the performance of an online metric, but these metrics aren’t necessarily correlated. It can be a significant exercise to determine offline and online metrics with high correlation.

Mistakes made at the start can sometimes only be evident in the end product. Generally speaking, mistakes can be costly within traditional software development, but they don’t necessitate throwing away the entire solution. ML has one principal deliverable and that is the model. But a single mistake can completely invalidate a model. Any work that occurs beyond the mistake needs to be discarded and that could entail data preparation, feature engineering, training, and deployment, which essentially amounts to an entire development cycle.

This article is Part One of a trio of articles that make up The MLOps Handbook. Stay tuned for more in Part Two!