How to Deploy Apache Pulsar Cluster in Kubernetes.
Preface.
Apache Pulsar is an open-source distributed pub-sub messaging system originally created at Yahoo and now part of the Apache Software Foundation.
Pulsar is a multi-tenant, high-performance solution for server-to-server messaging. It’s composed of a set of brokers and bookies along with an inbuilt Apache ZooKeeper for configuration and management. The bookies are from Apache BookKeeper which provide storage for the messages until they are consumed.
In this article, we will deploy an Apache Pulsar Cluster in Kubernetes. I will use Minikube, but in a production cluster process is the same. In a cluster, we’ll have:
- Multiple cluster Brokers to handle the incoming message from producers and dispatch the message to consumers
- Apache BookKeeper to support message persistence
- Apache ZooKeeper to store the cluster configuration
- Apache Pulsar Dashboard
Additionally, we will deploy the Prometheus & Grafana for monitoring purposes.
Step 1. ZooKeeper
You must deploy ZooKeeper as the first Pulsar component, as it is a dependency for the others:
zookeeper_micro.yaml
For testing purposes, I used 1 ZooKeeper, but in production, you should increase it to 3 or 5:
replicas: 1
Fill free to enlarge CPU and memory limits for ZooKeeper:
$ kubectl apply -f zookeeper_micro.yaml
Wait until all three ZooKeeper server pods are up and have the Running status. You can check on the status of the ZooKeeper pods at any time:
$ kubectl get pods -l app=zk
This step may take several minutes, as Kubernetes needs to download the Docker image on the VMs.
Step 2. Initialize cluster metadata
Once ZooKeeper is running, you need to initialize the metadata for the Pulsar cluster in ZooKeeper. This includes system metadata for BookKeeper and Pulsar more broadly. There is a Kubernetes job in the cluster-metadata.yaml file
You only need to run it once:
$ kubectl apply -f cluster-metadata.yaml
Step 3. Deploy the Bookies
Apache BookKeeper is a scalable, low-latency persistent log storage service that Pulsar uses to store data. We will use the Kubernetes DaemonSet:
$ kubectl apply -f bookie.yaml
Output:
Step 4. Brokers Deployment.
Config file for brokers deployment:
For testing purposes, I used 1 Pulsar broker, feel free to increase it in production.
$ kubectl apply -f broker.yaml
Step 5. Pulsar Dashboard Deployment.
The Pulsar dashboard is a web application that enables users to monitor current stats for all topics in tabular form. The dashboard is a data collector that polls stats from all the brokers in a Pulsar instance (across multiple clusters) and stores all the information in a PostgreSQL database. A Django web app is used to render the collected data.
Config file for Dashboard deployment:
Let’s deploy it:
$ kubectl apply -f pulsar-dashboard.yaml
Wait while Pulsar Dashboard will be deployed and access it going to http://minikube_ip:30005
You can find all necessary files in my tutorial repo.
Thank you for reading!