Setting up a Local MLOps dev environment — Part 1

Vishal Garg
5 min readApr 17, 2022

--

In this series of two articles, I have shared my experience of setting up a local MLOps dev environment on bare-metal Ubuntu workstations from scratch. This part is all about installing a Kubernetes cluster on the workstations and in the next part, we shall see how to install Kubeflow on the Kubernetes cluster.

Photo by Shane Rounce on Unsplash

Install Kubernetes Cluster on bare metal (Ubuntu 20.04)

Environment Info

  • Ubuntu Server 20.04.4 LTS
  • Three HP Z620 Tower Workstation With Single 4 Core / 8vCPU ( 1 Intel® Xeon® Processor E5–2643 / 32GB RAM / 1 TB HDD

The three servers/workstations shall be assigned hostnames and IPs as follows:

  • master — 192.168.1.207
  • worker1–192.168.1.199
  • worker2–192.168.1.208

What is this all about?

Kubernetes is a tool for orchestrating and managing containerized applications at scale on on-premise servers or across hybrid cloud environments. For any MLOps ecosystem, Kubernetes is the preferred way to host applications so that ML applications can scale and be hosted in diverse environments. Kubeadm is a tool provided with Kubernetes to help users install a production-ready Kubernetes cluster with implicit validations. This tutorial will demonstrate how a Kubernetes Cluster can be installed on a set of bare metal Ubuntu Server 20.04 workstations with Kubeadm.

To know more about Kubernetes check the official documentation at the Official site of Kubernetes.

This lab contains three servers. One control or master node, while the other two as worker nodes. These can be easily identified by their hostnames.

Step 0 Prepare the workstations with Ubuntu OS

Setup the workstations with Ubuntu Server 20.04.4 LTS OS. Please refer to the link with option 2 i.e. Manual server installation to install the OS on your workstation. This step can even be skipped if OS is already installed.

Update the /etc/hosts file of the three servers with relevant aliases and hostnames.

sudo vi /etc/hosts (make necessary entries for the three workstations)

Enable password-less login among nodes. [Optional]

ssh-keygen -t rsa
ssh-copy-id user@<hostname>

Once the servers are ready, update them.

sudo apt update
sudo apt -y full-upgrade
[ -f /var/run/reboot-required ] && sudo reboot -f

Step 1 Install kubelet, kubeadm and kubectl

Once rebooted, add the Kubernetes repository for Ubuntu 20.04 to all the nodes.

sudo apt -y install curl apt-transport-https
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list

Then install the required packages.

Note: For MLOps software, I will be installing Kubeflow 1.5.0 which is not compatible with version 1.22 and onwards, and hence I will install version 1.21.10 of Kubernetes CLI(s).

sudo apt update
sudo apt -y install vim git curl wget kubelet=1.21.10-00 kubectl=1.21.10-00 kubeadm=1.21.10-00
sudo apt-mark hold kubelet kubeadm kubectl

Confirm installation by checking the version of kubectl.

kubectl version --client && kubeadm version

Step 2 Disable swap

Turn off swap

sudo swapon --show
sudo vi /etc/fstab (comment the swap entry)
sudo swapoff -a
sudo swapon --show

If you are wondering, why swap needs to be disabled while installing Kubernetes, refer to this issue.

Step 4 Enable kernel modules and add configuration to sysctl

As a requirement for your Linux Node’s iptables to correctly see bridged traffic, you should ensure net.bridge.bridge-nf-call-iptables is set to 1 in your sysctl config, e.g.

# Enable kernel modules
sudo modprobe overlay
sudo modprobe br_netfilter
# Add settings to sysctl
sudo tee /etc/sysctl.d/kubernetes.conf<<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
# Reload sysctl
sudo sysctl --system

Step 5 Install container runtime

Kubernetes supports multiple container runtimes to run containers in Pods. However, in this article, I will only talk about Docker.

# Add repo and Install packages
sudo apt update
sudo apt install -y curl gnupg2 software-properties-common apt-transport-https ca-certificates
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt update
sudo apt install -y containerd.io docker-ce docker-ce-cli
# Create required directories
sudo mkdir -p /etc/systemd/system/docker.service.d
# Create daemon json config file
sudo tee /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
EOF
# Start and enable Services
sudo systemctl daemon-reload
sudo systemctl restart docker
sudo systemctl enable docker

Step 6 Initialize the master node

On the master node, make sure that the br_netfilter module is loaded.

lsmod | grep br_netfilter

Enable kubelet service

sudo systemctl enable kubelet

Next, pull the container images

sudo kubeadm config images pull

Bootstrap the cluster with kubeadm. Refer to https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/ to know more about the valid args for kubeadm init.

sudo kubeadm init --apiserver-advertise-address=192.168.1.207 --upload-certs --pod-network-cidr=192.168.0.0/16 --control-plane-endpoint=control-k8s

Next, set up cluster access for a regular user

mkdir -p $HOME/.kube
sudo cp -f /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Next, check the cluster-info using kubectl

kubectl cluster-info

Check if the master node is in which state.

kubectl get nodes -o wide

As can be seen, the master not is in a ‘Not ready’ state.

Step 7 Install network plugin on the master node

Next, install a Pod network add-on. In this exercise, I will be using calico however refer to supported network plugins to see supported addons.

kubectl create -f https://docs.projectcalico.org/manifests/tigera-operator.yaml 
kubectl create -f https://docs.projectcalico.org/manifests/custom-resources.yaml

Check if all pods are in a running state or not.

kubectl get pods -A

Also, now check the state of the master node.

kubectl get nodes -o wide

Step 8 Add worker nodes

Now, since the control plane or master node is all set, more nodes can be augmented in the “worker” role and those shall be primarily used by Kubernetes to schedule all the workload.

To join the cluster, use the join command which was generated in step 6.

Hurray!! we are now all set with our own local on-prem Kubernetes cluster with docker runtime.

Conclusion

We are now ready with a local Kubernetes cluster and all set for our next part. Meanwhile, even though it looks to be a straightforward and simple exercise, you might face certain issues as I did. Please feel free to share those with me in your comments/feedback. There might be something common which we can talk about :). Moreover, in general also, please leave your feedback/comments and let me know if you liked this article or for any scope for further improvements. Thanks!

Part-2

If you want to see all this in action with all logs & output, refer to this video.

--

--