Deploy a Dask Cluster on Kubernetes using Minikube on WSL — Part 1

A quick & easy guide to set up dask kubernetes cluster on your local machine using Minikube

Nirajkanth Ravichandran
ADL AI & Analytics Corner
5 min readSep 15, 2023

--

In the realm of data-intensive applications and large-scale data processing, the need for efficient and scalable computing resources is paramount. Kubernetes, an open-source container orchestration platform, has emerged as a powerful tool for managing and scaling containerized applications. When combined with Dask, a flexible parallel computing library in Python, you have the potential to create a highly resilient and dynamic data processing environment. In this article, we’ll explore the process of deploying a Dask cluster on top of a Kubernetes cluster, utilizing Minikube within the Windows Subsystem for Linux (WSL) environment.

Setting Up the Kubernetes Cluster

Before diving into deploying a Dask cluster, we need to have a Kubernetes cluster up and running on our local machine. While Minikube is a popular choice, setting it up within the WSL environment provides a powerful development setup.

Images from : https://codecrux.com/blog/minikube-tips.html

Selecting Minikube for Local Kubernetes in WSL

Minikube is an ideal choice for local Kubernetes development due to its simplicity and rapid setup. When combined with WSL, it enables a seamless Linux environment within a Windows operating system.

To begin, we need to meet the minimum requirements set by Minikube:

  • At least 2 CPU cores
  • 2 GB of available memory
  • 20 GB of available disk space
  • An active internet connection
  • A container or virtual machine manager (For our guide, we’ll use Docker.)

Installing and Configuring Minikube

To install Minikube within the WSL environment, follow these steps:

  1. Enable WSL: Ensure that you have Windows Subsystem for Linux (WSL) enabled on your Windows machine. You can follow the official Microsoft documentation to set this up.
  2. Install Docker Desktop: Install Docker Desktop for Windows, which will be used to manage containers within the WSL environment. Then, enable the ‘Use the WSL2 based engine’ and ‘wsl distribution’ in the setting as shown in the following figures in my case it is ubuntu.

3. Install kubectl: Kubectl is the command-line tool used to interact with Kubernetes clusters. Install it within your WSL environment according to the instructions for Linux.

curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl

chmod +x ./kubectl
sudo mv ./kubectl /usr/local/bin/kubectl

4. Install Minikube: Open a WSL terminal and install Minikube using a package manager or by downloading the binary from the official GitHub repository. You can simply use following commands.

curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube

5. Verify Minikube installation by version checking and start Minikube cluster as follows,

minikube version
minikube start --driver=docker

6. Verify Cluster Status: After a successful startup, you can verify the status of your cluster as:

minikube status

Deploying a Dask Cluster on Kubernetes within WSL

With our Minikube Kubernetes cluster up and running within the WSL environment, we can proceed to deploy a Dask cluster onto this infrastructure.

Understanding Dask

Dask is a Python library that allows parallel and distributed computing. It enables us to scale our computations from a single machine to a cluster of machines, making it suitable for processing large datasets and complex computations.

Deploying Dask on Kubernetes within WSL

There are two primary methods for running a Kubernetes cluster: the classic approach and the operator method. However, it’s important to note that the classic approach is becoming deprecated. Therefore, for the purposes of this guide, I recommend opting for the operator-based approach.

To employ the Dask operator, the initial step involves installing a custom resource definition (CRD). This task can be accomplished through Helm, the package manager, following these steps:

  1. Install Helm: Helm is a package manager for Kubernetes applications. We’ll use it to deploy Dask using pre-configured Helm charts. Install Helm within your WSL environment according to the official documentation for Linux.
  2. Add Dask Helm Repository: Add the Dask Helm repository to your Helm configuration within the WSL terminal:
helm install --repo https://helm.dask.org --create-namespace -n dask-operator --generate-name dask-kubernetes-operator

Verifying the Kubernetes Cluster

Before proceeding further, let’s perform a quick check to ensure that our Kubernetes cluster is set up successfully. You can run the following Python code snippet within your preferred Python/Jupyter environment:

Verify dask-operator is running as follows.

kubectl get pods -A

Then execute the following code in jupyter environment where we need to install dask and dask-kubernetes.

from dask_kubernetes.operator import KubeCluster
cluster = KubeCluster(name="daskmlcluster",
image='ghcr.io/dask/dask:latest',
n_workers=2,
resources={"requests": {"memory": "0.5Gi"}, "limits": {"memory": "1.5Gi"}},
env={"FOO" : "barr"}
)
cluster

Let’s check again the details of pods, as shown in this figure we can observer mycluster-scheduler, and 2 workers under default namespace which ensures the dask Kubernetes cluster is running successfully.

Great! now we have completed deploying a Dask Cluster on Kubernetes using Minikube on WSL

Conclusion

In this article, we’ve explored the process of deploying a Dask cluster on top of a Kubernetes cluster using the operator method, utilizing Minikube within the Windows Subsystem for Linux (WSL) environment. By combining the strengths of Kubernetes’ container orchestration capabilities with Dask’s parallel computing functionalities, we’ve created a powerful setup for processing large datasets and complex computations. This deployment allows us to harness the benefits of distributed computing within a seamless Linux environment on a Windows machine, without the complexities of managing the underlying infrastructure. As you delve deeper into the world of Kubernetes and Dask, you’ll find endless possibilities for optimizing your data processing workflows.

Visit Axiata Digital Labs to find out more about our products and services.

https://www.axiatadigitallabs.com/
.

References

  1. https://kubernetes.dask.org/
  2. https://docs.dask.org/en/stable/install.html
  3. https://minikube.sigs.k8s.io/docs/start/
  4. https://docs.docker.com/desktop/install/windows-install/
  5. https://helm.sh/docs/intro/install/
  6. https://learn.microsoft.com/en-us/windows/wsl/install

Disclaimer: ADL is not responsible for any damage caused by any of the articles to any internal or external parties.

--

--

ADL AI & Analytics Corner
ADL AI & Analytics Corner

Published in ADL AI & Analytics Corner

Meet the AI & Analytics team at Axiata Digital Labs as they bring a series of blogs to keep you updated with cutting-edge trends, breakthroughs, and advancements in the ever-evolving fields of artificial intelligence and analytics.

Nirajkanth Ravichandran
Nirajkanth Ravichandran

Written by Nirajkanth Ravichandran

Data Engineer at Axiata Digital Labs

No responses yet