What does MLOps mean to you?

How to take full advantage of MLOps and what you can expect from MLOps platforms.

Bartłomiej Poniecki-Klotz
Ubuntu AI
13 min readJul 12, 2023

--

Diagram with topics CEOs talked about in the earning calls, with big topic groups: Generative AI, Banks, Regulations, AI, Economic uncertainty, Sustainability, Layoffs, Reshoring
Topics CEOs talked about in Q2 2023, source

According to the survey IoT Analytics Q2/2023 Trend Report, three topics with the most traction in earning calls are:

  • AI & Generative AI
  • Economic uncertainty, Banks
  • Reshoring

The significance of Generative AI is multiplying.

When CEOs talk about the appliances of Generative AI in their business, another interesting talk happens at technology conferences worldwide. Technology Leaders know that only with a set of best practices, tools and patterns will AI transformation be fast, smooth and possible.

The words “MLOps”, or Machine Learning Operations, are scorching. Look at one of the top Kubernetes conferences — KubeCon. There is a Data and Machine Learning track where half of the presentations were about the MLOps. Additionally, MLOps in a highly regulated environment was the main topic for the Keynote presentation.

What is MLOps? It is a set of principles, tools and best practices which helps solve business problems using AI in a Production environment. The definition is fairly simple, but does everyone in the project expect the same from MLOps?

If not, then they need different tools and follow distinct best practices. Looking closer, the same role at different times will follow different principles. Data scientist, while experimenting, focuses on the speed of innovation. On the other hand, when analysing the data drift of models running in production, they need to know data, model lineage, and telemetry data.

Knowing what to expect from a modern MLOps stack is hard.

I want to make this easy for you. Check what MLOps can do for you, regardless of your position.

AI project team! Who’s there?

The radar graph for the AI team with skills: Data Skills, Soft Skills, Business understanding, ML modeling, Coding. On it there are multiple colors for Business Analyst, Data Engineer, Data Scientist, Application Developer, ML Engineer.
Skill radar for the AI project team

The AI project is run not only by data scientists. They do magical things occasionally, but they are not a single-man army. Behind the successful project, multiple people are working together. This way, they transform business ideas into viable solutions and fearlessly operate them on production.

Let’s see who you can find in such a team.

Business Analysts (BA)

The skill radar for Business Analyst with strong Soft Skills and Business understanding. Data Skills are a bit less important for them.
Business Analyst skill radar

Business Analysts are the closest IT people to the business stakeholders. They gather requirements and check if current systems and data can solve the problem.

Communication is one of their key skills because they talk with business and technical stakeholders. Additionally, they frequently review data sources across the organisation when validating the business requirement. Getting access to all needed datasets is cumbersome. They often wait a lot for access to data, then a schema with description and who is the custodian of it. A long list of things to wait for.

Data Engineers (DE)

The skill radar for Data Engineer with strong Data Skills and Coding.
Data Engineer skill radar

Training Machine Learning models is only possible with data. People who prepare data for experiments and models are Data Engineers. The role of a Data Engineer is less visible than Data Scientist but equally important. Data Engineers provide secured and convenient access to cleaned and up-to-date information.

In the first step, they acquire data from multiple sources like API endpoints, databases, documents or streams of events. At this step, data are not ready for consumption because gathered data requires description, cleaning and validation to become a piece of valuable information.

Data Engineers also expose collected pieces of information. Depending on the use case, they provide information as:

  • versioned datasets for Machine Learning
  • batches exported for Advanced Analytics
  • databases or data warehouses visualised as dashboards.

They face multiple challenges, like understanding the data and ensuring its quality and timeliness. They collaborate with Business Analysts (BA) and Subject Matter Experts (SMEs). BAs and SMEs provide a business context and help to understand information coming from different data sources and how to map them into common taxonomy. This way, every Data Scientist understands the data in the same way.

DEs are also masters of automation. They create consistent and reliable pipelines that automate the process of data ingestion, cleaning, validation and exposing in proper format.

Data Scientist (DS)

The skill radar for Data Scientist with strong Soft Skills, ML modeling and Business understanding.
Data Scientist skill radar

The team uses math, analytic methods, data science and machine learning to solve business problems. Simpler the solution, the better. To succeed in this role, speed of innovation and experimentation is important. Additionally, an experienced Data Scientist with extensive business domain knowledge is a spare resource. They are both expensive and hard to come by.

To be fully efficient, they need the following:

  • Access to clean and fresh data
  • A flexible computing environment
  • Way to track experiments and collaborate with others.

Data Scientists are using the most blood life of the company — the data. They require access to a frequently changing list of datasets across the organisation. The time they wait is time lost. On the other hand, they work with sensitive data. So security is the work prerequisite, like safety in the construction site.

More than secured environments is required. Compute environments must also be flexible, providing infrastructure adjusted to project needs. One of the strategic resources for Machine Learning is GPU. People use them for model training, inference in production or even for accelerating preprocessing and data cleaning jobs. Sharing GPUs is important because they are expensive but essential components for AI projects.

Compute environments with only operating systems are not the favourite way for Data Scientists to use. Investing long hours installing your favourite tools every time you start a new project differs from a definition of fun or effectiveness. They need preinstalled software and centralised tools with proper access management. They cooperate on the results and share code, metrics and lessons learnt from multiple experiments. The cooperative environment fosters innovation.

Application Developer

The skill radar for Application Developer with strong Coding and Soft skills.
Application Developer skill radar

The Application Developer is the prime IT consumer for the results of the AI project.

The results are in different forms, like:

  • Batch processing results stored in the database
  • API endpoint with Machine Learning model
  • Binary Machine Learning model embedded in the mobile or web application

They wrap the ML model responses into amazing UX to make it accessible for business users. Thanks to their cooperation with Business Analysts and Data Scientists, they turn Math-like ML model responses into business outcomes.

Application Developers are in multiple teams, so the artefact handover process is important for them. They need both developed artefacts and documentation.

MLOps Engineer

The skills radar for MLOps Engineer with strong Coding and equally imporant ML modeling and Data Skills.
MLOps engineer skill radar

Machine Learning Operations Engineers, MLOps Engineers for short, are quite new in this team. They joined because the Data Scientists’ experiments, insights and results require the same love DevOps provide for software code and Data Engineers for data. It means that MLOps Engineers provide automation and standardisation to the experiments.

A few of their tasks are:

  • building pipelines out of experiments
  • exposing models for consumption by Application Developers
  • managing models’ lifecycle
  • monitoring models’ performance in Production

MLOps Engineer works with Data Scientists to transform the experiments into a full-fledged AI-powered product. They also run the models in production.

Meanwhile, in the ideal MLOps world

Behind every role, there are real people. Today you meet the heroes of an AI project. They are effectively using the MLOps platform in their daily work.

Meet Anna, Jill, John, Kevin and Luke, our team of AI experts. They work on the same platform, but each sees it differently.

John, a Business Analyst

A man in a blue suit, showing how John a Business Analytic can look like
Photo by Tamarcus Brown on Unsplash

John is a Business Analyst. He just talked with business stakeholders and has a good idea about a new business problem to solve. As a data-driven company, his first step is to find the data. He opens a data catalogue to view data across the whole organisation. He searches for already prepared and versioned datasets and tables.

After finding the required dataset, he sees that he cannot access it. But he can access the table metadata — description, names of columns, data owner and link to the procedure to access actual data. Before starting the procedure, he knows if this is what he needs. Additionally, he knows which other teams are using this data and quality metrics like freshness or distribution. Solving problems is a team sport, so he talks with other groups using this dataset. It helps him avoid the same issues they had.

The screen from Amundsen — Table Detail Page with visualization of a Hive / Redshift table
Amundsen — data catalogue

John creates a new feature request document with all necessary documentation. He attaches expected business outcomes, links to data, and additional accesses required. The team implements well-documented features much faster because everyone has the essential information.

Open source tools used:

  • Amundsen — data catalogue
  • Apache Atlas — data governance and metadata framework
  • Apache Ranger — data security and monitoring

Jill, a Data Engineer

A women in front of a computer, showing how Jill a Data Engineer can look like
Photo by Christina @ wocintechchat.com on Unsplash

Jill is a Data Engineer. She just talked with John about the data needed. The Data Science team needs a full and cleaned dataset to experiment. First, she contacts other groups already using the same data. They share the code repository, where she finds a few pieces of a puzzle already done. One of the teams created a pipeline step which loads data, removes duplicate rows and renames columns according to organisation policy. All codes are saved in the Git repository, while CICD pipelines test them on each change.

The next step is to create a fully automated pipeline. It is built based on the project template for data processing. Using templates simplifies the process of developing new data processing pipelines. Jill changes the parts specific to the current dataset and reuses the rest. In a matter of hours, the new pipeline finishes successfully. The MLOps platform provides her with object storage integration, metadata store and a workflow engine with scheduling capabilities. Templates and integration allow for swift data pipeline development.

The Kubeflow Dashboard with a single recurring run which will be executed every minute.
The Recurring Run in the Kubeflow Dashboard

She integrates and correlates additional data sources when she has the first pipeline. Pipelines can run code directly or use distributed computing engines like Spark. This time there is a huge dataset waiting for her. She implements a data processing job using Spark DSL. This way, she can use the full computing power of the MLOps platform to process the dataset quickly.

Open source tools used:

  • Kubeflow Pipelines — MLOps pipelines orchestration
  • Spark — data processing engine
  • Gitlab — code repository and CICD pipelines

Anna, a Data Scientist

A woman writes on the laptop in the office space, showing how Anna, a Data Scientist can look like
Photo by Thought Catalog on Unsplash

Anna is a Data Scientist. She met Jill yesterday during the implementation phase. Today she has the first dataset from her. Anna is amazed by Jill’s responsiveness.

She takes a dataset snapshot and uses it to start her experiments. She wants to use the same dataset version to do multiple experiments. This way, she can compare the results objectively.

The diagram showing the how files are versioned using DVC with Local Workspace, Local Cache, Remote Code Storage and Remote Data Storage.
Data and model versioning using DVC

First, she needs a computing environment. Yesterday she used an environment with two top-tier GPUs, but today she will need a lot more memory and CPU. On top of it, she loves using VSCode to develop experiments instead of Jupyter Notebook. She selects the VSCode environment with 4 CPUs and 64GB of memory. The MLOps platform creates one for her.

In just a few minutes, she has a clean and secure environment. The environment is integrated with a code repository, object storage and experiment-tracking software. Each experiment is visible in the UI with its parameters, metrics and model artefacts.

The screen from the MLflow UI with a table of experiments, parameters and metrics.
MLflow UI with a list of the experiments, parameters and metrics

The VSCode environment is great for short experiments. Anna needs to train a model for multiple hours. She uses distributed training and hyperparameter tuning tools. She can run numerous experiments simultaneously and tune model parameters.

The VSCode integration with Kubeflow Notebook. There is python file and Terminal.
VSCode integration with Kubeflow Notebooks

Anna creates multiple experiments and finds the perfect model. She knows what code, data, and configurations produce the best outcome. She shares all three with MLOps Engineer to create a product.

Open source tools used:

  • DVC — data versioning
  • Kubeflow Notebooks — flexible computing environment
  • MLflow — model registry and experiment tracking
  • Katib — distributed training and hyperparameter tuning

Luke, an MLOps Engineer

A man in from of the laptop and monitor in the office, showing how Luke, a MLOps Engineer can look like
Photo by Studio Republic on Unsplash

Luke is a MLOps Engineer. He puts down his morning coffee and opens the email from Anna (DS). She has finished her experiments and has some amazing models. Luke is in charge of transforming the models into products and operating them in production.

He uses the project templates similarly to Jill (DE). The template has multiple common steps prepared and a place to plug in the custom code. He uses a components library supported by internal teams or an open-source community.

John’s pipeline connects to multiple systems, and wants to keep the credentials secured. He uses Vault integration to provide secrets in a secure way to the pipeline steps. Additionally, RBAC separates all pipeline runs and their results from other users on the MLOps platform.

Anna used pre-trained models in her experiment. Luke’s pipeline downloads the pre-trained model and uses transfer learning to train on the company’s data. Then it packages the result as a Docker Image. CICD pipelines deploy the packaged model to the pre-prod environment for E2E testing.

The endpoint deployed by Luke is production ready. The inference engine wraps the model and provides additional capabilities. Inference engine providing monitoring metrics, build-in documentation and supports building extensions using Funtion-as-a-Service. Luke cooperates with Anna (DS) to develop the drift detection for the deployed model.

Luke can easily deploy the model on any Kubernetes cluster.

kubectl apply -f - << END
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: object-detection
spec:
name: object-detection
predictors:
- componentSpecs:
- spec:
containers:
- name: classifier
image: bponieckiklotz/jellyfish.object-detection:dev@sha256:b19fb9d48e43e1c0a9afe6480553ae506b198d48e347be3f13f9d794f0b5e270
resources:
limits:
nvidia.com/gpu: 1
graph:
name: classifier
name: default
replicas: 1
END

When the pipeline is ready, he makes sure to automate steps building, adds tests and runs the MLOps pipeline in the pre-prod environment. Luke prepared the Machine Learning model for Applications Developers.

Open source tools used:

  • Kubeflow Pipelines — MLOps pipelines orchestration
  • MLflow — model registry and experiment tracking
  • Seldon Core, KServe — model serving
  • Prometheus, Grafana — observability stack
  • KNative — function as a service

Kevin, an Application Developer

A man in from of the laptop in the conference room, showing how Kevin, an Application Developer can look like
Photo by Desola Lanre-Ologun on Unsplash

Kevin is an Application Developer in a different department than Anna, Jill, John and Luke. He heard rumours about the AI project team working swiftly together.

Kevin asks for access to the Machine Learning model. The response comes quickly. The email contains details of access to the API, model and API documentation with the contract. He deploys the model to the pre-prod environment. Deployed model endpoint supports both HTTP and gRPC requests.

curl  -s http://models.kubeflow.local/api/v0.1/predictions \
-H "Content-Type: application/json" \
-d '{"data":{"ndarray":[[5.6, 0.31, 0.37, 1.4, 0.074, 12.0, 96.0, 0.9954, 3.32, 0.58, 9.2]]}}'

{"data":{"names":[],"ndarray":[5.247960704489777]},"meta":{"requestPath":{"classifier":"seldonio/mlflowserver:1.16"}}}

Kevin is an Application Developer in a different department than Anna, Jill, John and Luke. He heard rumours about the AI project team working swiftly together.

Kevin asks for access to the Machine Learning model. The response comes quickly. The email contains details of access to the API, model and API documentation with the contract. He deploys the model to the pre-prod environment. Deployed model endpoint supports both HTTP and gRPC requests.

Grafana Dashboard monitoring number of requests, success rate and latency for ML model deployed with Seldon Core.
Grafan Dashboard with Seldon Core monitoring.

In the response, he finds the array of numbers. Thanks to the model documentation, he knows what they mean. He also gets the model metadata to trace each model’s request, response and version. Every call to the model endpoint is visible in the Observability stack and logs.

Open source tools used:

  • MLflow — model registry
  • Seldon Core, KServe — model serving, API contract
  • Prometheus, Grafana — observability stack

Summary

This short story helps you see how the Modern MLOps platforms help you in your work. Everyone uses the same platform but sees it from a different perspective. Everyone’s benefits also vary.

What if you wear multiple hats?

If you perform tasks from multiple described roles, that’s even better. You benefit from all of the things described above.

Let’s recap what the MLOps Platform can do for you:

  • helps with data access and exploration
  • provides a scalable environment for experimentation with cutting-edge tools
  • standardises the pipeline approach for faster pipeline creation
  • enables usage of project templates and shareable components
  • standardise API endpoints contracts and documentation
  • provide out-of-the-box data, model and platform monitoring
  • security by design

What is your role in the project, and how can MLOps help you? Write in the comments below.

For more MLOps Hand-on guides, tutorials and code examples, follow me on Medium and contact me via social media.

--

--