MlOps: Machine learning Pipelines using kubeflow
Effective MLOps on Kubernetes using Kubeflow
Machine Learning !! Machine Learning !! Machine Learning !!
Yes, this is the current scenario. Everywhere I go I see Machine learning. Well, AI and ML have now become the most important components of any application. An e-com website needs an ML model, websites like Netflix use an ML model to suggest content to their customers. There are many such examples where ML plays a very important role in accelerating the business. But how easy is the lifecycle of building an ML Model? That too in a containerized ecosystem? Phew, it's a nightmare. For a typical ML Model, these would be the basic steps
Assume that all of the aforementioned steps are isolated workflows. Gathering them into one pipeline is highly impossible. Don't you worry, Kubeflow to our rescue !! The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Kubeflow is the ML toolkit for Kubernetes.
One such component that we’d be highlighting in this article is Kubeflow Pipelines.
What is the entire story all about? (TLDR)
- Understanding Kubeflow Pipelines.
- Developing a Sample Model.
- A Kubernetes Cluster ( EKS, AKS, Kind, etc ).
- Kubeflow Pipelines Installed.
- GitHub Link: https://github.com/pavan-kumar-99/medium-manifests
- GitHub Branch: kubeflow
You can select the most relevant way of installing kubeflow based on your Kubernetes distribution here. I have a local on-prem cluster and I have installed only Kubeflow pipelines using the documentation here.
Once Kubeflow pipelines are installed, the UI would look something like this
Before you run your first kubeflow pipeline there are some important terms that you should be having an Idea about. Let us now understand them.
A kubeflow pipeline is a description of an ML workflow, including all of the components in the workflow and how they combine in the form of a graph. The pipeline includes the definition of the inputs (parameters) required to run the pipeline and the inputs and outputs of each component.
A pipeline component is a self-contained set of code that performs one step in the ML workflow (pipeline), such as data preprocessing, data transformation, model training, and so on. A component is analogous to a function, in that it has a name, parameters, return values, and a body.
A graph is a pictorial representation in the Kubeflow Pipelines UI of the runtime execution of a pipeline. The graph shows the steps that a pipeline run has executed or is executing, with arrows indicating the parent/child relationships between the pipeline components represented by each step.
A run is a single execution of a pipeline. Runs comprise an immutable log of all experiments that you attempt and are designed to be self-contained to allow for reproducibility. You can track the progress of a run by looking at its details page on the Kubeflow Pipelines UI, where you can see the runtime graph, output artifacts, and logs for each step in the run.
Alright, we now have a basic understanding of the terms in Kubeflow. Let us now run our first kubeflow pipeline.
Running the Kubeflow Pipeline
All the ML Kubeflow Pipelines are built using Kubeflow Pipelines SDK. The Kubeflow Pipelines SDK provides a set of Python packages that you can use to specify and run your machine learning (ML) workflows. A pipeline is a description of an ML workflow, including all of the components that make up the steps in the workflow and how the components interact with each other.
Alright, Let us now understand the code. We are creating 2 components as a part of this code namely process_data_op, train_op ( Which are only printing some statements as of now ). As mentioned above A pipeline component is a self-contained set of code that performs one step in the ML workflow (pipeline), such as data preprocessing, data transformation, model training, and so on.
And after that, we are creating a Kubeflow pipeline using the sequential_pipeline function. And in that function, we are defining the steps needed for our pipeline by calling the components created earlier. And we are also defining the dependency by using the after function.
$ git clone https://github.com/pavan-kumar-99/medium-manifests.git \
-b kubeflow$ cd medium-manifests$ python3 medium-pipeline.py
That’s it. We are now compiling the pipeline and generating the pipeline yaml.
Let us now upload this Pipeline YAML.
Once the pipeline is uploaded, You can run the pipeline from the Kubeflow Pipeline UI.
So this is how our pipeline looks like in a Graphical representations. All the events. logs, visualisations ( if created ) can now be seen in the console as well.
Well, this was a very simple kubeflow pipeline. This article will ensure that one has sufficient knowledge before getting started with Kubeflow pipelines. In my next article on the Kubeflow series, I will take a real time usecase and explain how we can use Kubeflow to design an E2E ML Solution ( Including the Model Serving using TFServing as well ).
Until next time…..
HashicVault Secrets in Kubernetes with CSI Driver
Injecting secrets to Kubernetes Pods using Vault CSI Driver
PKI Certs Injection to K8s Pods with Vault Agent Injector
Inject PKI Certs Dynamically to Kubernetes Pods using Vault Agent Injector
Terraforming the GitOps Way !!!
Terraform with GitOps using Atlantis (Pull request Automation)….