Use Kubeflow, Rook and Istio to build an AI integrated development and delivery platform

Hu Song
6 min readFeb 9, 2020

--

This article may be helpful for AI engineers, AI scientists, Cloud-native developers and operations.

Photo by Franck V. on Unsplash

Artificial Intelligence (AI) is a new technological science that is combination of theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.

AI was proposed as early as the 1930s. After decades of development, related theoretical research has made certain progress, but it is limited by the level of computer hardware at the time, and the actual effect is not outstanding. In recent years, the breakthrough development of GPU chips and the full popularization of cloud computing, Internet of Things, and big data have brought artificial intelligence to an unprecedented good opportunity.

Especially after 2013, the concept of Cloud Native was proposed. It refers to a series of cloud computing technology systems and enterprise management methods. It mainly includes three characteristics: containerization, orchestration, and microservices. AI has also begun to develop in a containerized direction, including that the components that make up the platform operate in a containerized manner and provide containerized AI power.

This article mainly introduces how to build an end-to-end AI integrated development and delivery platform in a private environment. The whole idea is to deploy on Kubernetes platform using Kubeflow, Rook, Istio and other open source projects.

I. Prerequisite

1、What is Kubeflow

The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Our goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. Anywhere you are running Kubernetes, you should be able to run Kubeflow.

In this AI integrated development and delivery platform, Kubeflow can provide users with containerized AI computing power, Jupyter IDE development environment, multi-tenant resource isolation, and automated machine learning processes.

2、What is Rook

Rook

Rook is an open source cloud-native storage orchestrator, providing the platform, framework, and support for a diverse set of storage solutions to natively integrate with cloud-native environments.

Rook turns storage software into self-managing, self-scaling, and self-healing storage services. It does this by automating deployment, bootstrapping, configuration, provisioning, scaling, upgrading, migration, disaster recovery, monitoring, and resource management. Rook uses the facilities provided by the underlying cloud-native container management, scheduling and orchestration platform to perform its duties.

We build storage orchestrator by Rook which can mainly provide the following three types of storage service:

(1)Provides the storage space for components itself for the Kubeflow software;

(2)Provides object storage space for AI application training and serving data requirments;

(3)Provides storage space for AI trained models;

So it is said that Rook provides basic and critical infrastructure storage services for AI integrated platform.

3、What is Istio

Istio

Developers must use microservices to architect for portability, meanwhile operators are managing extremely large hybrid and multi-cloud deployments. Istio lets you connect, secure, control, and observe services. Istio makes it easy to create a network of deployed services with load balancing, service-to-service authentication, monitoring, and more, with few or no code changes in service code.

AI ​​integrated platform provides multi-tenant authentication and resource isolation by using Istio.

II、Start to deploy

1、Install the CUDA driver and Docker runtime environment in the server node which has GPUs, you can refer here

Docker on GPU nodes

(1) Install docker environment;

(2)Install nvidia-docker2 package;

(3)Modify the docker daemon configuration file and the nvidia runtime as the default runtime environment;

2、Deploy nvidia-k8s-plugin in Kubernetes cluster

#kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta4/nvidia-device-plugin.yml

3、Deploy Rook

As in distributed storage systems, Ceph is the most widely used and mature open source storage system. In Rook, we choose Ceph as the Rook storage orchestration object. Rook enables Ceph storage systems to run on Kubernetes using Kubernetes primitives. The following image illustrates how Ceph Rook integrates with Kubernetes:

Ceph operator in Rook k8s-cluster

(1)download Rook installation script to /opt:

#cd /opt/rook/cluster/examples/kubernetes/ceph

(2)install the following three yaml files in order:

#kubectl create -f common.yaml

#kubectl create -f operator.yaml

#kubectl create -f cluster.yaml

(4)create storageclass kind service to provide rdb for pods:

#kubectl create -f storageclass.yaml

(5)Set the storage class of the “rook-ceph-block” to the default value:

#kubectl patch storageclass rook-ceph-block -p ‘{“metadata”: {“annotations”:{“storageclass.kubernetes.io/is-default-class”:”true”}}}’

4、install MetalLB component and package is here. All required yaml files are in the manifests directory. The specific method can refer to here.

(1)create configmap file,you can refer to example-config.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
namespace: metallb-system
name: config
data:
config: |
address-pools:
— name: default
protocol: layer2
addresses:
— 192.168.1.240–192.168.1.250 ##here is your load balancing IP address segment

#kubectl create -f example-config.yaml

(2)install MetalLB service

#kubectl create -f metallb.yaml

5、Deploy Kubeflow and download the Kubeflow installation package from the official website. The 0.6.2 version is used here. Meanwhile, for those who need multi-tenant resource isolation requirements, you also need to download and use kfctl_existing_arrikto.0.6.2.yaml,The specific method can refer to here.

(1)Set environment variables and initialize the installation path.

#export KUBEFLOW_USER_EMAIL=admin@kubeflow.org

#export KUBEFLOW_PASSWORD=”12341234"

#kfctl init kubeflow-6-2

--config=/root/kfctl_existing_arrikto.0.6.2.yaml

(2)Enter the installation directory

#cd kubeflow-6-2

(3)install Kubeflow

#kfctl generate all -v

#kfctl apply all -v

(4)login kubeflow platform

Get the EXTERNAL-IP address through #kubectl get svc -n istio-system, and then you can log in to your browser https://EXTERNAL-IP

username:admin@kubeflow.org

password:12341234

III、Use Kubeflow platform

1、Notebook is a component in KubeFlow for creating AI development and test environment. You can create your own jupyter servers, edit your notebooks in it, and generate reports in it.

Kubeflow dashboard
apply notebook server resource
the kubeflow tenant’s notebook servers list
Jupyter notebook IDE

2、PipeLines is a conponent for operating workflows in KubeFlow. After you install Kubeflow, you can see KubeFlow has prepared several sample examples. You can submit new pipeline to Kubeflow from dashboard or using Kfp sdk.

pipeline sample in kubeflow

3、Multi-tenant configuration

Add static users for basic auth:

To add users to basic auth, you just have to edit the Dex ConfigMap under the key staticPasswords.

# Download the dex config
kubectl get configmap dex -n auth -o jsonpath='{.data.config\.yaml}' > dex-config.yaml
# Edit the dex config with extra users.
# The password must be hashed with bcrypt with an at least 10 difficulty level.
# You can use an online tool like: https://passwordhashing.com/BCrypt
# After editing the config, update the ConfigMap
kubectl create configmap dex --from-file=config.yaml=dex-config.yaml -n auth --dry-run -oyaml | kubectl apply -f -
# Restart Dex to pick up the changes in the ConfigMap
kubectl rollout restart deployment dex -n auth

Reference:

1、Kubeflow: https://www.kubeflow.org/

2、Rook: https://rook.io/

3、Nvidia: https://github.com/NVIDIA/k8s-device-plugin

--

--

Hu Song

Consultant in Cloud Native | Big Data | AI | Digital Currency