Starting your cloud native ML/AI experience on Kubernetes with Kubeflow and Windows via WSL2

This tutorial about Kubeflow is split into three parts — build a Machine Learning pipeline cloud native with Kubeflow on Kubernetes.

Part1: Start on your machine with Windows WSL2

Part2: Process your mobile data with Kubeflow, do Feature Engineering and develop your Machine Learning model in Kubeflow with an automated Machine Learning pipeline in Kubeflow (upcoming)

Part3: Serve your same code base and Machine Learning model via Kubeflow in Kubernetes from any setup — local, on-premise or in the cloud with GCP, Azure or AWS (upcoming)

Image for post
Image for post

So you wanna have a flexible Machine Learning pipeline to clean and process data and build on top of that Machine Learning models ? Use Kubeflow on Kubernetes to be fully flexible and use OpenSource technology without any vendor lock-in of any cloud provider. Kubeflow is ready to handle your massive data, coming for example from mobile devices, process that in pipelines and serve afterwards with a Machine Learning model. Your ML predictions with horizontal scaling in any cluster — local, on-premise or in your cloud vendor of choice — let’s start local, for the beginners to learn Kubeflow on Kuberbetes.

As a software engineer loving to have a Macbook Pro, i really appreciate the latest WSL2 support from Windows to have locally an almost native Linux experience.

Kubeflow

Kubeflow is Machine Learning on Kubernetes and not part of this article for detailed explanation, check the official Kubeflow page.

Image for post
Image for post
Some parts of Kubeflow, its a great toolchain for ML on cloud native Kubernetes with OpenSource

WSL2

Windows Subsystem for Linux has in Version 2 major improvements for full Linux experience with faster file execution and system compatibility, because it’s a full VM on the Windows Hypervisor. More about WSL2 in this link https://docs.microsoft.com/en-us/windows/wsl/wsl2-about

Ok, let’s start from the beginning. First we need to enable it with the the latest version of Windows and then if you want just install a Linux distribution like Ubuntu from the Microsoft AppStore. For your info, this distributions itself are just Containers which are running on the new Linux Kernel ! With the latest version of Docker Desktop for Windows, we have full WSL2 support, great ! Remember high performance of WSL2, WSL2 is using the Hypervisor from Windows. You can see this if you run uname -a in your local Linux distribution console, you get the WSL2 Linux kernel, independent which distribution you installed from the AppStore:

Linux raubtier 4.4.0–19041-Microsoft #1-Microsoft Fri Dec 06 14:06:00 PST 2019 x86_64 x86_64 x86_64 GNU/Linux

And then check your used distribution with cat /etc/os-release, you see its here the Ubuntu 18.04 user space on our WSL2 Linux kernel:

Image for post
Image for post

You can see the current state of all your WSL2 Linux installations with wsl -l -v

Check which Linux distributions are active in our WSL2

Tip: Use the new Windows Terminal to easy open different Linux shells, each installed distribution has it’s own menu entry to start a native shell.

Docker and Kubernetes

Next install Docker Desktop, lately with full support for WSL2 ! In Docker Desktop you can easily enable full Kubernetes support, running in your local WSL2 System — now we are ready for Kubeflow :-)

Image for post
Image for post
Docker Desktop — Enable Kubernetes support

The road to Kubeflow

First we need to install kbctl in our WSL2 environment, with a Linux distribution of your choice and make it executable:

curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectlchmod +x ./kubectlsudo mv ./kubectl /usr/local/bin/kubectl

Next we copy the local Windows Kubernetes config to our WSL2 environment

mkdir ~/.kube
cp /mnt/c/Users/yourname/.kube/config ~/.kube

Install Kubeflow

An easy way to install Kubeflow is theit kfctl command, as described on https://www.kubeflow.org/docs/started/k8s/kfctl-k8s-istio/. It will create a vanilla installation of Kubeflow with no dependencies.

Run in your local WSL2 distribution to make kfctl global available

curl -LO https://github.com/kubeflow/kfctl/releases/download/v1.0.1/kfctl_v1.0.1-0-gf3edb9b_linux.tar.gztar -xvf kfctl_v1.0.1-0-gf3edb9b_linux.tar.gzchmod +x kfctlsudo mv ./kfctl /usr/local/bin/kfctlrm kfctl*.tar.gz

Next we have to specify some environment variables to define our Kubeflow setup like the Kubeflow name and the location and apply then the YAML for installation.

export KF_NAME=wsl2-kubeflow
export BASE_DIR="${HOME}/kubeflow/"
export KF_DIR=${BASE_DIR}/${KF_NAME}
export CONFIG_URI="https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_k8s_istio.v1.0.1.yaml"

mkdir -p ${KF_DIR}
cd ${KF_DIR}
kfctl apply -V -f ${CONFIG_URI}

This takes a while, depending on your machine and setup — best is to have a machine with 32GB memory, the basic setup consumes already over 12GB RAM after Kubeflow has bootstrapped. Lets see where our Kubeflow dashboard is:

echo "http://localhost:$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')"

We will get http://localhost:31380, lets try it out, WSL2 will share the IP with our local system:

Image for post
Image for post
Yippie, we have Kubeflow on Kubernetes

The first time you call the Dashboard you will be asked for the default namespace, create one or just accept the anonymous one.

Check the pods in kubeflow namespace:

kubectl get pods  -n kubeflow
Image for post
Image for post
Alls pods form our local Kubeflow installation

Remove local Kubeflow

If you want to remove the Kubeflow installation in your local Kubernetes cluster, just run:

kfctl delete -f ${CONFIG_FILE}

First steps with Kubeflow

After we have now Kubeflow up and running, we make a short smoke test and build and train our first Machine Learning model

  1. Create a Jupyter Notebook server

In the Dashboard go to Notebook servers and create a first Server instance, using the Tensorflow 2.1 Docker image

Image for post
Image for post
Create a new Jupyter server

After the Jupyter Server is created, just connect

Image for post
Image for post
Our Jupyter server is ready

It’s easy to go the the Jupyter lab: http://localhost:31380/notebook/anonymous/test-notebook/lab

Image for post
Image for post
Jupyter Lab in our local Kubeflow installation

Open a terminal and download the simple TFEstimator titanic example ML model:

Image for post
Image for post
Download the TFEstimator titanic example

Tip — Install required dependencies in pip with the user option and restart the kernel.

Start your Machine Learning experience with a smile

Image for post
Image for post
Explore the Titanic survivor data
Image for post
Image for post
Make a histogram over the age of the Titanic passengers

As an addition, if you miss the Kubernetes Dashboard, you have first to install it in your local cluster and make the ClusterIP available for your local browser

kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.3/aio/deploy/recommended.yaml
kubectl proxy

and setup the authentication method of your choice: https://kubernetes.io/blog/2020/05/21/wsl-docker-kubernetes-on-the-windows-desktop/

Next

Image for post
Image for post
Part 2 will cover handling and processing of data and build a ML Model

In part 2 we will build our own Model with custom mobile data, consuming our movement data from a mobile Flutter app, process the data via Kubeflow in a cloud native setup and build our first Machine Learning Model with SciKit and Tensorflow/Keras. This is done in Kubeflow with an automated Machine Learning pipeline.

Written by

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store