Kubeflow and Local Deployment of Kubeflow

What is Kubeflow and How do you install it in your local machine or on-prem server?

Kevin Karobia
6 min readFeb 14, 2022

What is Kubeflow?

Model development Life Cycle (MDLC) is a term commonly used to describe the flow between model training and inference. Training refers to the process of creating the machine learning model and Inference refers to the process of using a trained machine learning model to make predictions.

The stages of MDLC can be broken down to: data exploration -> feature preparation -> model training and model tuning -> model serving -> model testing -> model versioning.

Figure 1: Model development life cycle.

Kubeflow is a collection of cloud native tools (usually refered to as components) for developing and maintaining all the stages of MDLC. Kubeflow is built on the container orchestration tool Kubernetes. It can be a complex and time-consuming process for machine learning engineers to configure and coordinate each stage of the MDLC directly in a Kubernetes cluster. Therefore, Kubeflow offers the platform that provides the tools to configure, develop, automate and deploy each stage of the MDLC in a Kubernetes cluster. This makes it easier and less time consuming to configure and apply changes directly into a Kubernetes cluster. You can read more about Kubeflow from its documentation here.

Kubeflow Local Deployment

In order to successfully deploy Kubeflow, we have to have a Kubernetes cluster running on our machine. Since Kubernetes itself requires a large amount of resources, we use software that emulates a Kubernetes cluster e.g Minikube, Kind, and K3s. We’ll use Minikube for this case.

Disclaimer: I’m using Mac-OS, therefore the commands in this blog post are tailored to that operating system. For Linux users, only the Minikube installation will be different, the rest of the commands should work as excepted. For Windows users you can find their equivalent on the components documentation page.

Minikube Installation

For Mac-Os, we’ll use Homebrew to install Minikube:

 $ brew install Minikube

Minikube Windows and Linux installation guides can be found here.

Minikube requires a driver to work an in our case we’ll use docker, we’ll also use Homebrew to install docker.

$ brew install docker

Docker Windows and Linux installation guides can be found here. I’d also recommend install docker desktop for Mac Os and Windows (unfortunately it’s not available for Linux).

To start our kubernetes emulated cluster, we run:

$ minikube start -- name kubeflow --kubernetes-version=v1.20.1

We specify the kubernetes-version because a component (apiextensions.k8s.io/v1beata1) is deprecated in kubernetes version 1.22.0 and later. We also specify the name of the cluster in case we want to create different clusters at the same time, this also enables us to start and stop the named cluster at will.To verify that our cluster is up and running we run:

$ kubectl cluster-info

You should have something similar to:

Figure 2: Expected output from kubectl cluster-info

Kubeflow Installation

For Kubeflow installation we have to install the kfctl program first, kfctl is the control plane for deploying and managing Kubeflow. First we’ll define environment variables:

Note: kfctl is currently available for Linux and Mac-OS users only. If you use Windows, you can install kfctl on Windows Subsystem for Linux (WSL).

$ PLATFORM=$(uname)
$ export PLATFORM
$ mkdir -p ~/Kubeflow/bin
$ export KUBEFLOW_TAG=1.2.0
$ KUBEFLOW_BASE="https://api.github.com/repos/kubeflow/kfctl/releases"
$ KFCTL_URL=$(curl -s ${KUBEFLOW_BASE} | grep http | grep "${KUBEFLOW_TAG}" | grep -i "${PLATFORM}" | cut -d : -f 2,3 | tr -d '\" ' )

The above block of commands creates the PLATFORM, KUBEFLOW_TAG, KUBEFLOW_BASE and KFCTL_URL variables which represent the current platform we’re running the commands on (Linux or Darwin), the version of kfctl we want, URL to the all the kfctl releases and URL to the version of kfctl we have specified respectively. We also create the directory ~/Kubeflow/bin to store the kfctl binaries.

$ wget "${KFCTL_URL}"
$ KFCTL_FILE=${KFCTL_URL##*/}
$ tar -xvf "${KFCTL_FILE}"
$ mv ./kfctl ~/Kubeflow/bin

We use the wget tool to download the kfctl compressed tar file, get the downloaded tar file’s name by splitting the URL string defined in the previous command block and save it to the KFCTL_FILE variable then extract it using the tar command. We then move the contents of the tar file (the kfctl program) to the directory we created earlier.

We add the kfctl directory (~/Kubeflow/bin) to the PATH variable so as to use it in the current terminal shell.

$ export PATH=$PATH:~/Kubeflow/bin

Note: this will allow kfctl to be used in the current terminal shell only, once you close the terminal you will have to run the command again. To add it permanently, you have to add the command to ~/.zshrc or ~/.bashrc depending on the shell you’re using.

To verify kfctl is installed, we run the command below. You should receive same version as below:

$ kfctl version
kfctl_v1.2.0-0-gbc038f9

Creating a Kubeflow Project

Docker and Kubernetes use yaml files for configuration, therefore it shouldn’t came as a surprise that Kubeflow uses yaml files also or as they are called in this context manifests. The manifest files define all the kubeflow services (components) to be deployed in this cluster. As we did before we’ll define some environment variables.

$ MANIFEST_BRANCH=${MANIFEST_BRANCH:-v1.2-branch}
$ export MANIFEST_BRANCH
$ MANIFEST_VERSION=${MANIFEST_VERSION:-v1.2.0}
$ export MANIFEST_VERSION

The MANIFEST_BRANCH variable specifies which version of Kubeflow to get based off the GitHub branch where the Kubeflow version lies and the MANIFEST_VERSION variable specifies the version on the manifest files we have.

Note: you can visit kubeflow manifests repo and see which version is available, e.g to see all the versions available in the v1.2-branch you can head here https://github.com/kubeflow/manifests/tree/v1.2-branch/kfdef

$ KF_PROJECT_NAME=${KF_PROJECT_NAME:-hello-kf-${PLATFORM}}
$ export KF_PROJECT_NAME
$ mkdir "${KF_PROJECT_NAME}"

We the define the Kubelfow’s project name, in this case it will be hello-kf-{platform you’re on}. We then create a directory (with the same name as the project name) where we will store the manifest files.

$ manifest_root=https://raw.githubusercontent.com/kubeflow/manifests
$ FILE_NAME=kfctl_k8s_istio.${MANIFEST_VERSION}.yaml
$ KFDEF=${manifest_root}${MANIFEST_BRANCH}/kfdef/${FILE_NAME}

We define the manifest_root variable this contains the root URL that points to the raw files in the kubeflow/manifests repository, FILE_NAME variable this contains the name of the manifest file we want and the KFDEF variable that combines previously defined variables to create the URL that points to the exact manifest file we require.

$ kfctl apply -f $KFDEF -V

To download and apply the manifest file, we run the above command. You will see a lot of status messages, some warning messages and sometimes even error messages, however if the command below returns a 0 you should be good to go.

$ echo $?

Note: This deployment can take up to 30 minutes to complete.

To verify that the installation was successful, we run the command below. If all the pods are in RUNNING or COMPLETED mode then the deployment was successful.

$ kubectl get pods --all-namespaces -w

You should have something similar to:

Figure 3: expected output from kubectl get pods — all-namespaces -w

Note: if you have any pod in any other state other than running or complete, like the activator-6c87fcbbb6-gw5mv pod in figure 3. you can run the following command to find out more info on what’s wrong: kubectl describe pod activator-6c87fcbbb6-gw5mv — — namespace knative-serving

Accessing The Kubeflow UI

Since our components are running locally, in order to access Kubeflow UI all we have to do is a simple port-forward. We achieve this by running

$ kubectl port-forward svc/istio-ingressgateway -n istio-system 7777:80

We use Kubectl to expose port 80 of the istio-ingressgateway pod from the istio-system namespace and forward the data going through this port to localhost’s port 7777 .Finally to access the UI, we enter http://localhost:7777 in our browser. You should have something similar to:

Figure 4: Kubeflow Web UI

Conclusion

In this blog post we learnt how to download, configure and install Kubeflow on a local machine. This is the first step in interacting with Kubeflow. In future blog-tutorials we will explore deployment of Kubeflow on the cloud, Kubeflow components as well as generation of Kubeflow pipelines. Stay tuned!

--

--