Setting up a Local MLOps dev environment — Part 2
In this series of two articles, I have shared my experience of setting up a local MLOps dev environment on bare-metal Ubuntu workstations from scratch. This part is all about installing a Kubeflow deployment on the Kubernetes cluster which we have already set up on Ubuntu workstations as explained in Part 1.
Setup default storage class
As a prerequisite, Kubeflow needs a Kubernetes cluster with a default Storage Class. A storage class provides dynamic provisioning of Persistent Volumes, whenever PVC claims it. In simple terms, it abstracts the underline storage and provides a way for admins to describe the classes of storage they offer. Refer StorageClass to know more about them.
For our purpose, Rook with Ceph as Storage backend shall be provisioned using CSI. Rook provides Open-source cloud-native storage for Kubernetes while Ceph is a distributed, scalable open-source storage solution for block, object, and shared file system storage. To know more about Rook and Ceph, refer to this link.
Prerequisites
- A valid Kubernetes cluster, which we have already provisioned.
- A Kubectl installation on a machine that has access to your Kubernetes cluster, configured with the cluster as the primary cluster.
- Ability to clone the Rook GitHub repository on the same machine that has kubectl installed
- For a Test Deployment — A single node cluster or a single master and single worker cluster, each node in the cluster requires a mounted unformatted volume, and LVM must be installed, allow workloads on Masters: Enabled, CNI: Calico
At least one of these local storage options is required:
- Raw devices (no partitions or formatted filesystems)
- Raw partitions (no formatted filesystem)
- PVs available from a storage class in block mode
Confirm whether there are partitions or devices, that can be used for configuring Ceph or not.
lsblk
lsblk -f
If the FSTYPE field is not empty, there is a filesystem on top of the corresponding device. In this example, we can use sdb for Ceph and can’t use sda or its partitions.
Step 1 — Install LVM
To avoid any issues in setting up Ceph on raw devices install LVM by running the command below on all the servers/workstations.
sudo apt-get install -y lvm2
Step 2 — Clone Rook Github Repo
Clone the Rook Github repo into an empty directory as follows.
mkdir rook-single-node
cd rook-single-node
git clone --single-branch --branch release-1.5 https://github.com/rook/rook.git
Step 3 — Install Rook
First, install the CRDs
cd rook/cluster/examples/kubernetes/ceph
kubectl create -f crds.yaml -f common.yaml -f operator.yaml
Second, create the cluster.
kubectl create -f cluster-test.yaml
Validate the installation.
kubectl -n rook-ceph get pod
There shall be a number of pods that will get deployed as part of this process and it would take time, so just wait for a while until all the pods are in a running state.
Step 4 — Provision Storage Class
Provision the storage class for a minimal installation using the provided storageclass-test.yaml from Rook Github repo (rook/cluster/examples/kubernetes/ceph/csi/rbd).
kubectl apply -f storageclass-test.yaml
Step 5 — Make the storage class as default
Last, make the storage class ‘Default’.
kubectl patch sc rook-ceph-block -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
Step 6 — Install Kubeflow
Refer following prerequisites for deploying Kubeflow via manifests method.
Prerequisites
Kubernetes
(up to1.21
) with a default StorageClass .️ Kubeflow 1.5.0 is not compatible with versions 1.22 and onwards.kustomize
(version3.2.0
) (download link). Kubeflow 1.5.0 is not compatible with the latest versions of of kustomize 4.x.kubectl
Installing Kubeflow with a single command
Refer to Installation methods suggested in Kubeflow’s manifest repo. Basically, there are two methods,
- Single-command installation of all components under
apps
andcommon
- Multi-command, individual components installation for
apps
andcommon
Option 1 targets ease of deployment for end-users.
Option 2 targets customization and the ability to pick and choose individual components.
Since, in my case, I am going to install all the components of Kubeflow, I opted for Option 1 and proceeded as follows.
Download the kustomize.
(version 3.2.0
) and clone the manifest repo from https://github.com/kubeflow/manifests.
Execute the following command from within the manifests directory.
while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done
This will take some time to provision all the needed K8s resources. Simultaneously keep a watch on pods, pv(s), pvc(s) as this would take a considerable amount of time to get them all in a healthy and running state.
Once, everything is installed successfully, you can access the Kubeflow Central Dashboard by logging in to your cluster.
If you want to see all this in action with all logs & output, refer to this video.
Conclusion
As part of these two articles, we just saw, how a local MLOps dev environment can be set up on bare-metal Ubuntu workstations. Later, we would talk about setting up other perspectives of MLOps, such as Feature Store, Data Pipelines etc., and also see some of the examples end to end. Please do provide your feedback/comments and let me know about your views.