Building End-to-End MLOps Pipelines for Sentiment Analysis on Azure with Terraform, Kubeflow v2, Mlflow, and Seldon: Part 4

Rachit Ahuja
8 min readJun 28, 2023

--

Part 1: Introduction and Architecture planning

Part 2: Developing workflows for infrastructure deployment via CI-CD

Part 3: Setting up MLflow on AKS

Part 4: Setting up Kubeflow and Seldon on AKS

Part 5: End-to-End Training Pipeline and Inference Deployment using Seldon.

In Part 4 of our series, our objective is to install Kubeflow and Seldon resources within the AKS cluster that was provisioned in Part 2. To achieve this, we will leverage kustomize, a Kubernetes native configuration management tool, to seamlessly install Kubeflow onto our AKS cluster.

Kubeflow Installation

Kubeflow is an open-source platform for machine learning (ML) workflow orchestration on Kubernetes. It provides a range of tools for data scientists and ML engineers to manage, deploy, and monitor ML workflows in a scalable and efficient manner. In this blog post, we will walk through the steps to install Kubeflow on Azure Kubernetes Service (AKS).

Prerequisites:

  • Kubernetes Cluster (Created in Part 2)
  • Kustomize (5.0.0)
  • kubectl

To initiate the installation of Kubeflow, the first step is to clone the repository at https://github.com/kubeflow/manifests. This repository contains Kubeflow manifest files that adhere to the best practices for installing Kubeflow on various Kubernetes engines, not limited to AKS. Once the repository is cloned, we can proceed with the installation of Multi-User Kubeflow Pipelines.

Kubeflow pipelines offer multi-user isolation as part of their multi-tenancy capabilities. Resources within Kubeflow Pipelines are isolated using Kubernetes namespaces, which are managed by Kubeflow’s profile resource. This ensures that each user operates within their designated namespace, preventing unauthorised access to resources. The Kubeflow Pipelines API server enforces permissions, rejecting requests to namespaces for which the current user lacks appropriate authorization.

Experiments are associated directly with namespaces, while runs and iterations are organised within the parent experiment’s namespace. Additionally, the execution of pipelines occurs within a user namespace, providing the user with the advantages of Kubernetes namespace isolation. This allows for customised configurations, such as assigning different secrets to various services within different namespaces.

Once the repository is cloned, we can follow the subsequent steps outlined below to install Kubeflow within our AKS cluster:

Step 1 : Get Kubeconfig of the AKS cluster: In order to get kubeconfig of the cluster we will run the following command:

Step 1 : Get Kubeconfig of the AKS cluster: In order to get kubeconfig of the cluster we will run the following command:

az aks get-credentials --resource-group blogpost-terraform --name terraform-aks

Step 2: Clone the git repository where kustomization manifests are kept:

git clone https://github.com/kubeflow/manifests

Step 3: Install certification manager (cert-manager). Cert-manager is utilized by numerous components of Kubeflow to furnish certificates for admission webhooks.

Kustomize command to install Certificate Manager

Step 4: Install Istio Services. Several Kubeflow components leverage Istio to ensure the security of their traffic, enforce network authorization, and implement routing policies.

Installation of Istio Services

Step 5: Install OpenID Connect Identity (OIDC). For this implementation we will install an OIDC client called Dex. Dex is an OIDC that supports multiple authentication backends. In its default setup, it comes with a static user account having the email “user@example.com,” and a default password “12341234”. However, for any Kubeflow deployment meant for production purposes, it is crucial to change the default password by referring to the relevant section. So for this implementation we will first alter the default account and then apply the manifest. In order to change the default email and password run the following command:

  • Step 5a: Run: vi common/dex/base/config-map.yaml. This config-map contains the default email id and password.
  • Step 5b: In order to change the password we need to hash our password using bcrypt. In order to encrypt your password run the following command:
python3 -c 'from passlib.hash import bcrypt; import getpass; print(bcrypt.using(rounds=12, ident="2y").hash(<PASSWORD>))'

The above command will generate the hashed password which will be supplied to our OIDC client in order to authenticate user login.

  • Step 5c: Now we will make changes in the config-map.yaml.
apiVersion: v1
kind: ConfigMap
metadata:
name: dex
data:
config.yaml: |
issuer: http://dex.auth.svc.cluster.local:5556/dex
storage:
type: kubernetes
config:
inCluster: true
web:
http: 0.0.0.0:5556
logger:
level: "debug"
format: text
oauth2:
skipApprovalScreen: true
enablePasswordDB: true
staticPasswords:
- email: ahuja.rachit@hotmail.com
hash: $2y$12$WT7HTjhudIwMRI0MN.Ejuuuq.moyD4.05/a1.pFalsj2yET/C5n0K
# https://github.com/dexidp/dex/pull/1601/commits
# FIXME: Use hashFromEnv instead
username: user
userID: "15841185641784"
staticClients:
# https://github.com/dexidp/dex/pull/1664
- idEnv: OIDC_CLIENT_ID
redirectURIs: ["/authservice/oidc/callback"]
name: 'Dex Login Application'
secretEnv: OIDC_CLIENT_SECRET

Now once we apply the manifest instead of the default email id and password i.e. “user@example.com” and “12341234” we will have our custom login credentials.

  • Step 5d: Install Dex inside our AKS cluster.
Applying Manifest for installing Dex
  • Step 5e: Install OIDC AuthService. The AuthService for OIDC expands the capabilities of your Istio Ingress-Gateway, allowing it to operate as an OIDC client.

This applies the OIDC configuration with the email id as “ahuja.rachit@hotmail.com” and our custom password.

Step 6: Install Knative which is used by the kubeflow components. Knative is a Kubernetes container orchestration platform extension that facilitates the operation of serverless workloads on Kubernetes clusters. It offers utilities and tools that simplify the process of building, deploying, and managing containerised applications within Kubernetes, making it a more seamless and native experience. In order to install Knative run the following commands:

kustomize build common/knative/knative-serving/overlays/gateways | kubectl apply -f -
kustomize build common/istio-1-16/cluster-local-gateway/base | kubectl apply -f -

Step 7: Create Kubeflow namespace. This is where all the kubeflow components will live. In order to create the namespace run the following command:

kustomize build common/kubeflow-namespace/base | kubectl apply -f -

Step 8: Create kubeflow cluster roles.

Creation of Cluster roles for Kubeflow

Step 9: Generate the necessary Istio resources for Kubeflow. The current kustomization produces an Istio Gateway, referred to as kubeflow-gateway, within the kubeflow namespace. If you intend to use your own Istio, you must also include this kustomization. In order to install the Istio resources run the following command:

kustomize build common/istio-1-16/kubeflow-istio-resources/base | kubectl apply -f -

Step 10: Install Kubeflow Pipelines. For this particular demo we will be installing Multi-user Kubeflow pipelines. Initially the manifest first installs Argo using the secure emissary executor, run as non-root. However, it is essential to note that the installer must still evaluate the security implications that may arise when containers are run with root access. Consequently, it is highly recommended that the main containers for Kubeflow pipelines be deployed and executed as runasnonroot, without any special capabilities, to reduce security risks. We can apply

Multi-user Installation for Kubeflow pipeline

Step 11: Finally we will create the user namespace. This is the most important part of the implementation. The most important thing was to separate resources at user level so that each user can create personalised experiments sharing the common resources still separating out the environment itself. Users who lack authorization cannot access resources in your Profile/Namespace. This is because the Kubeflow Pipelines API server declines requests for namespaces that the current user is not authorized to access.

By default the manifest creates user with name “kubeflow-user-example-com” but we will change the kustomize patch to create an actual user namespace. In order to change the default user following are the steps:

  • Step 11a: Change manifests/common/user-namespace/base/params.env file and update the username.
  • Step 11b: Apply the manifest using the following command:
kustomize build common/user-namespace/base | kubectl apply -f -

This will create the user specific namespace.

Now, with this we have successfully installed Kubeflow in your AKS cluster. To access Kubeflow, the recommended method is through port-forwarding, which allows for quick setup without any specific environment prerequisites. You can execute the following command to port-forward Istio’s Ingress-Gateway to local port 8080:

kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80

Upon executing the command, the Kubeflow Central Dashboard can be accessed by following these steps:

  1. Open your web browser and go to http://localhost:8080. This should direct you to the Dex login page.
  2. Use the user’s credentials to log in, which consist of a default email address and password.
Dex Log In page for Kubeflow Dashboard

Once you enter the login credentials user-specific dashboard should be visible.

User-Specific Kubeflow Dashboard

Seldon Installation

Now, Seldon core is by default installed with the Kubeflow installation we just did. For a detailed overview of Seldon installation you can refer here. In order to test Seldon deployment we need to perform following steps:

  • Step 1: Create namespace Seldon and label it for inference.
kubectl create ns Seldon
kubectl label namespace seldon serving.kubeflow.org/inferenceservice=enabled
  • Step 2: Create a Seldon Deployment with a dummy model using the deployment manifest shown beflow:
cat <<EOF | kubectl create -n seldon -f -
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: seldon-model-example-0-classifier
namespace: seldon
spec:
name: seldon-model-example-0-classifier
predictors:
- componentSpecs:
- spec:
containers:
- image: seldonio/mock_classifier_rest:1.3
name: classifier
graph:
children: []
endpoint:
type: REST
name: classifier
type: MODEL
name: example
replicas: 1
EOF

Now this should create the deployment name seldon-model in namespace seldon.

Seldon Resources for Dummy model

Now in order to test our dummy deployment we will port-forward the service to a local-port and curl the results. To port foward the model service run the following:

kubectl port-forward svc/seldon-model-example -n seldon 8000:8000 

And, to test the model run the following command:

curl -s -d '{"data": {"ndarray":[[1.0, 2.0, 5.0]]}}' -X POST http://localhost:8000/seldon/seldon/seldon-model/api/v1.0/predictions    -H "Content-Type: application/json"

You should expect the following result: {“data”:{“names”:[“proba”],”ndarray”:[[0.43782349911420193]]},”meta”:{}}.

We have configured Kubeflow and Seldon inside our AKS cluster. The only thing left is actually start training the ML model.

--

--

Rachit Ahuja

Machine learning and Data Engineer at Data Reply GmbH