MLOps in Plain Words

Published in

Inside the Tech by SoftServe

9 min readJun 10, 2021

What is MLOps? Is it here to stay or just another trend to fade away? How has it changed the way ML projects are done?

SoftServe launched MLOps practice about two years ago as a reasonable response to the very same challenges, Data Scientists have been experiencing over and over while implementing different projects.

So far so good. So, in this article Volodymyr Solskyy, leader of MLOps practice, demonstrates why this new Ops is so to the point.

To start with, let’s take a look at the recent role of Machine Learning

Growing complexity of modern-day tasks has been exposing limitations of traditional Software Engineering for quite a while now, forcing us to search for new approaches. Machine Learning has already proven to be one of the best candidates for the job so far, with its ability to scale to the complexity without requiring extensive domain experience from ML engineers. This leads to some people naming ML as new Software Engineering or Software 2.0.

Sounds promising, yet it’s too early to state that we’ve got a silver bullet. Why?

To answer this question, we need to look holistically at the whole process of applying ML to meet particular objective or to address a problem.

There is misperception that Machine Learning is all about model training and tuning. Well, not really, in fact it’s just a small piece of a larger puzzle. To deliver value from Machine Learning multiple steps must be taken, generally grouped in two phases:

Model development — starts with data ingestion, then data analysis, feature crafting and model creation.
Model serving — is needed to make your model useful. It involves ingesting new, previously unseen data that needs to be validated against your existing model pipelines.

While we managed to harness the first phase, the second one remains a major challenge.

According to Gartner survey, in 2018 only 47% of all AI/ML projects made it fully to production. In 2020, this figure grew by 8%. Pretty slow progress, right? The root of the problem lies in difficulties with integrating ML model into real environment.

McKinsey: The inability to integrate analytic solutions into workflows and achieve frontline adoption is the number one inhibitor to why data and analytics initiatives fail.

What makes bringing AI/ML solution to production such a tall order?

Out of numerous factors we can distinguish three central ones:

Insufficient data quality

ML is a completely data centric paradigm. Getting enough data seems to be a no-brainer as we have numerous sources of information and millions of data points generated daily. The key issue is quality of this data from both business and technical perspective.

Rapid growth of complexity

As the industry matures, complexity of tasks for us to solve grows exponentially. ML by nature will scale to everything we are going to throw at it, but we are still left to manage the complexity. Therefore, we generate more and more experiments involving more and more data, which requires more computational power, GPUs, FPGs, etc. Consequently, we bury ourselves in the sheer ramification of infrastructure, code and processes required to deliver value.

Unclear success criteria

Traditional Software Engineering heavily relies on functional specifications and requirements, making evaluation of progress and results relatively straightforward task.

In ML, we operate “proxy” functions and metrics. This also gets the job done. But the way how we get there is different, quite often making it quite hard to reconcile business objectives with obtained results. Also, reaching good numbers for objective functions does not necessarily transforms into success from business standpoint (look no further than Netflix challenge, for example).

How can we do better? Before addressing this question, let`s clarify what this “better” stands for.

Measuring progress in bringing ML solutions to production

To be able to quantify progress of any effort, we need some set of objective metrics for it, and ML in production is not an exception.

Over the years of accumulating experience, our team has arrived at 4 major categories that cover most of the process, specifically KPIs related to:

ML:

Implementation time
Time to deploy
Model performance
Business impact
Number of model calls (i.e. relevancy or popularity)
Average call duration

Data dependencies:

Time to access
Data ingestion time
Number of data issues

Infrastructure:

Resource provision time
Auditability
Chargeback and showback reports

Operations:

Number of models in production
Number of new models deployed
Model versioning
Number of calls and errors
Hardware utilization

Over the years of experiments, industry worked out what can be considered an ideal ML project lifecycle.

It starts with framing your business idea or need — doing sanity check whether that idea is feasible at all — going over the rapid cycle of modelling iterations until good model is created — proceeding to production phase (to have everything deployed) — measuring the impact to understand if the initial objectives are met. The result of such analysis is an input to any next large iteration of the project.

Here is an extensive scheme of this process:

Can we expect a typical Data Science team to deliver this cycle end-to-end? Well, experience so far has shown us that no. And here’s why.

The brutal reality of developing ML project without MLOps

At average, ML teams are built around the idea that desirable outcome of their work is either a model, or a nice, ad-hoc research report.

Hence, the primary focus is on stuffing those teams mostly with Data Scientists that are extremely well versed in research, mathematics, and data analysis, but usually less knowledgeable/experienced in engineering topics like scaling, automation, code optimizations, etc. In theory those parts can always be covered by Software Engineers, but this is precisely where things start to break, since both sides usually work at different velocities using different (often conflicting) approaches, different set of tools, and, in general, speak different languages. This leads to missed deadlines, poor overall quality, and teams being completely frustrated with each other. All together it causes huge losses for business.

Now, deploying ML to production is challenging stage, yet it is not the end of the journey. Congrats for making it this far and welcome to a new exciting world of operating ML-based systems in production. Your data sets will drift, and models will degrade; you will encounter biases, privacy and compliance issues, deal with latencies, throughputs, and general infrastructure instabilities, debug complex problems, and scratch your head over reproducibility issues.

This brutal reality can quickly and easily overwhelm unprepared teams, leading to countless overtimes, wasted budgets, and frustrated stakeholders effectively spelling doom for another ML initiative.

However, the journey does not necessarily need to end like this…

Ingredients of successful ML project

As we briefly touched upon already, successful ML project requires at least the following set of skills:

In a nutshell:

Data Scientists doing all that stuff around model training.
Data engineers doing lots of heavy lifting with the data.
DevOps engineers supporting the environment.
Software Engineers implementing integration with the upstream and downstream systems.

But as we have also shown in previous section, having experts to cover each of those skills wouldn’t do any good.

This is where MLOps comes into play.

The essence of MLOps

Just like others in Ops family, MLOps is not a particular profession or role. Instead, it’s a form of a contract, a set of practices aimed at enabling Data Scientists and Subject Matter Experts to deliver their work faster and in more robust and scalable way.

The pillars of MLOps contract, among others, include:

Shared responsibilities and trust.
Self-service APIs and services.
Reusable code templates and playbooks.
Coding conventions and style guides.
Support for ML toolkit and frameworks.
Unified environments and images.
Unified data access and operations.
CI/CD and SDLC guidelines.

Applied properly, this contract will:

Establish tight collaboration between researchers and engineers.
Minimize code refactoring and rewrite between research and production.
Establish iterative, rapid, and high-quality ML model delivery lifecycle.
Ensure MLOPs practices are shared among the stakeholders.

What does this seamless collaboration bring? Ability to deliver ML models to clients or business users faster and more efficiently.

What changes does it introduce to our work routine?

Centralized ML data management

If there’s one single improvement that should be considered for every AI project, it’s this one — data unification for model development and serving.

Key idea: ML requires data that is properly consolidated (preferably into a single source), QCed, and automated.

Quite often, the project involves multiple data sources such as streams, data lakes and databases, object stores, and manual annotations. This complex landscape is the exact reason why a Data Scientist spends 80% of their time searching for relevant data and doing data munging. This is also source of too many surprises in production while the model is live and serving users.

The solution to this problem is Feature Store — a central vault for storing curated, well documented, tightly controlled features that can be used for training and serving across multiple models. Essentially, feature store is data warehouse for ML. Having one helps Data Scientists to quickly search and reuse available features, avoid duplication, ensure that features are always up to date and there’s no discrepancy between data used for training and serving.

Versioning

While in traditional Software Engineering the end result primarily depends on the code written by developers, the situation is much more complex for ML-based systems. Since ML is purely data-centric approach, even slightest change in incoming data or data pipeline can and will lead to a different result. Also, as many algorithms are non-deterministic by nature, the results might heavily depend on initial values, hyperparameters used, etc.

As such, to ensure reproducibility, provenance, and ease hand-offs of models, it makes sense to version control:

Training data: raw data snapshots.
Environments: images, libraries, dependencies.
Data Transformations: data pipelines and processing code.
Features: feature metadata and extraction code.
Models: model code, weights, hyperparameters, binaries.
Experiments: configuring, metrics, artifacts.

Modularity

While it’s quite tempting (and actually practiced) to put everything into a single script or a notebook, such monoliths are hard to optimize, scale, and reuse. Instead, it’s way better to structure the code as a set of abstracted out, loosely coupled components with well-defined interfaces and join them in a form of a DAG or pipeline.

Applying this approach enables us to:

Reuse single components in other pipelines.
Scale each component individually if needed.
Test, debug, and troubleshoot the code faster and easier.

It also allows us to utilize full power of platforms such as Kubernetes/Kubeflow and Airflow.

Building CI/CD pipelines for ML

No matter how many improvements introduced, we won’t be able reap the benefits if we do things manually. In fact, the level of automation for the data, models, and code pipelines determines level of maturity for ML process. Let’s take a look at how the fully automated process might look like:

The process starts from data that is stored in central location — feature store.

From here Data Scientists begin model development on the data sample. They go through the regular cycle of experimentation, data validation, feature crafting, building models, etc. Here we encountered first major change — the output of Data Scientists work is not a model anymore, but the reproducible code that is committed to a source code repository.

Once the code is there, the operations team can orchestrate the deployment into staging/production environment, where fully automated pipelines will run all the steps on entire dataset, produce a model, and store it in model registry. All parameters and details of this process are to be recorded in metadata storage for later audit (if required). From model registry the model is deployed in semi- of fully-automated fashion via A/B testing or Challenger process. Once the model is in production, it will be closely monitored by the ops team along with its data pipelines to ensure system stability and all of the important findings will be fed back to development/training/deployment loop.

As you might have noticed already, this new process is faster, less prone to human error, and ensures proper separation of responsibilities between various roles.

What does it bring us? The most essential benefits of MLOps

Reproducibility & Auditability:

All artifacts can be tagged and audited.
Pipelines are reproducible and their results are verifiable.
Code drives deployments.

Validation:

Minimize bias and enable provenance.
Online/offline comparisons of model quality.
SWE best practices for quality control.

Automation & observability:

Controlled rollout capabilities.
Live comparison for model performance: real vs. expected.
Feedback loop for model improvements and drift.