Pulsar on Kubernetes: Using Operator Lifecycle Manager to Install Pulsar Operators

Sherlock Xu
10 min readMar 8, 2023

--

Apache Pulsar is a cloud-native streaming and messaging system with a decoupled architecture for computing and storage. This flexible design allows users to use Pulsar smoothly on Kubernetes especially in terms of scaling. Currently, the Pulsar community provides the Pulsar Helm chart to install Pulsar and manage the deployment on Kubernetes. Helm is a very helpful tool to install applications on Kubernetes as it uses a template mechanism to standardize configurations for different components of an application. However, to manage Pulsar with a more flexible and automated approach, we may want to consider an alternative in Kubernetes, or operators.

StreamNative Pulsar Operators allow users to manage the full lifecycle of Pulsar on Kubernetes. In this blog post, I will demonstrate how to install Pulsar Operators using Operator Lifecycle Manager (OLM) and then use the Pulsar Operators to install Pulsar.

I assume you have a basic knowledge of Kubernetes but I would like to briefly talk about operators first. I believe the concepts I am going to explain will help you understand better how operators work.

Kubernetes operators

Operators are described as follows in the Kubernetes documentation:

Operators are software extensions to Kubernetes that make use of custom resources to manage applications and their components. Operators follow Kubernetes principles, notably the control loop.

Well, to understand the description here, you need to know the following concepts: resources, custom resources, controllers, and custom controllers.

Resources are the building blocks of applications running on Kubernetes. Each resource represents an endpoint in the Kubernetes API that stores a collection of API objects of a certain kind. Kubernetes provides users with many built-in resources, such as Pods, Deployments, StatefulSets, and Jobs, to perform different tasks.

These built-in resources have their respective Kubernetes controllers, which monitor their status and make changes on the cluster if necessary. Each controller tracks at least one Kubernetes resource type and makes sure the object is in its desired state according to its manifest.

In addition to built-in resources, Kubernetes also allows you to create Custom Resources (CRs), which provide an API extension mechanism. With Custom Resources, you can customize the behavior of your applications running on Kubernetes.

Before you create Custom Resources, you need to tell Kubernetes their identities or specifications using Custom Resource Definitions (CRDs). Similar to other Kubernetes objects, a Custom Resource Definition has its own API version, metadata, and specifications in its manifest. In addition, it contains all the configurations and fields that users can define for the Custom Resource it is associated with.

With Custom Resource Definitions, you can create any kind of Custom Resources in a Kubernetes cluster. However, this does not mean they can be automatically managed for operations like scaling or updating. In other words, you can create these resources but nothing will actually happen. You need custom controllers to monitor the status of these resources and take the corresponding action according to their configurations. These custom controllers are known as “operators”.

I know many people may use “operators” and “controllers” interchangeably. For any new users to Kubernetes, I don’t think you need to put too much attention on the concept itself. Generally, deploying an operator for an application means you create the controller and CRD(s) on Kubernetes.

Pulsar Operators

We know that Pulsar contains three key components: Pulsar brokers, BookKeeper, and ZooKeeper. The Pulsar Operators are also designed as three individual Operators.

Pulsar Operators
  • Pulsar Operator. A custom controller that manages the lifecycle of Pulsar brokers and proxies. The related CRDs are pulsarbrokers.pulsar.streamnative.io and pulsarproxies.pulsar.streamnative.io. Proxies are optional in a Pulsar deployment but they are widely used as a traffic gateway for Pulsar instances running on Kubernetes. This is because it is not always easy to expose the IP addresses of brokers to clients. The downside is that the proxy layer may lead to extra traffic within the Kubernetes cluster. Depending on your use case, you can decide whether you need to deploy Pulsar proxies.
  • BookKeeper Operator. A custom controller that manages the lifecycle of the BookKeeper cluster. The related CRD is bookkeeperclusters.bookkeeper.streamnative.io.
  • ZooKeeper Operator. A custom controller that manages the lifecycle of the ZooKeeper cluster. The related CRD is zookeeperclusters.zookeeper.streamnative.io.

With these concepts in mind, we can start to prepare the environment for installation.

Before you begin

You need to create a Kubernetes cluster (v1.16 <= Kubernetes version < v1.26) with kubectl installed. To provide persistent storage for BookKeeper and ZooKeeper, you must configure a default storage class. The following is my Amazon EKS environment for your reference.

kubectl get nodes -o wide

NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ip-192-168-31-53.ap-southeast-1.compute.internal Ready <none> 137m v1.22.17-eks-48e63af 192.168.31.53 xx.xxx.xxx.xx Amazon Linux 2 5.4.228-132.418.amzn2.x86_64 docker://20.10.17
ip-192-168-32-243.ap-southeast-1.compute.internal Ready <none> 137m v1.22.17-eks-48e63af 192.168.32.243 xx.xxx.xxx.xx Amazon Linux 2 5.4.228-132.418.amzn2.x86_64 docker://20.10.17
ip-192-168-89-241.ap-southeast-1.compute.internal Ready <none> 137m v1.22.17-eks-48e63af 192.168.89.241 xx.xxx.xxx.xx Amazon Linux 2 5.4.228-132.418.amzn2.x86_64 docker://20.10.17
kubectl get pods -A

NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system aws-node-4q84r 1/1 Running 0 130m
kube-system aws-node-7gd2v 1/1 Running 0 130m
kube-system aws-node-hf8pl 1/1 Running 0 130m
kube-system coredns-cfcfc4887-rqjpn 1/1 Running 0 140m
kube-system coredns-cfcfc4887-xdkzd 1/1 Running 0 140m
kube-system kube-proxy-98x56 1/1 Running 0 130m
kube-system kube-proxy-m4wmp 1/1 Running 0 130m
kube-system kube-proxy-xltmx 1/1 Running 0 130m
kubectl get sc

NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
gp2 (default) kubernetes.io/aws-ebs Delete WaitForFirstConsumer false 141m

Installing Operator Lifecycle Manager and Pulsar Operators

Operator Lifecycle Manager (OLM) is a popular and powerful tool to manage operators. With OLM, your CRDs can be automatically updated when you upgrade operators. Simply put, OLM is like the “Operator” for Kubernetes operators. We need to install it first and then use it to install Pulsar operators (I don’t want to complicate things but there are just too many concepts to explain when it comes to Kubernetes 😂). If you don’t want to use OLM, you can use Helm to install Pulsar Operators. For more information, see this article Pulsar Operators Tutorial Part 1: Create an Apache Pulsar Cluster on Kubernetes by Yuwei Sung.

Now let’s get started.

1. Run the following command to create necessary Kubernetes resources of OLM like CRDs, ClusterRoles, and Deployments. The script also creates two namespaces - olm and operators. The former is used to place OLM workloads and the latter can be used to deploy resources related to Pulsar operators.

curl -sL https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.23.1/install.sh | bash -s v0.23.1

2. View the workloads created:

kubectl get pods -n olm

NAME READY STATUS RESTARTS AGE
catalog-operator-6fb7776fff-z4nxs 1/1 Running 0 10m
olm-operator-94c998864-frq7c 1/1 Running 0 10m
operatorhubio-catalog-nf97r 1/1 Running 0 9m49s
packageserver-cddccd9fc-sj5nf 1/1 Running 0 9m49s
packageserver-cddccd9fc-sspn6 1/1 Running 0 9m49s

3. Install the CRDs and custom controllers for Pulsar components (brokers, proxies, BookKeeper, and ZooKeeper). The controllers are deployed in the operators namespace by default.

kubectl create -f https://raw.githubusercontent.com/streamnative/charts/master/examples/pulsar-operators/olm-subscription.yaml

4. Verify that Pulsar Operators are installed successfully. In the output below, you can see that the controllers are running as Deployments with a Service created for each of them for internal communications. Note that you may need to wait some time before the controller Pods are up and running. OLM runs Jobs first to do some pre-work before deploying the Operators.

kubectl get all -n operators

NAME READY STATUS RESTARTS AGE
pod/bookkeeper-operator-controller-manager-587d574b57-mlsjj 2/2 Running 0 28m
pod/pulsar-operator-controller-manager-69c6ffbbc4-jbg6n 2/2 Running 0 28m
pod/zookeeper-operator-controller-manager-849bf45499-2nhk2 2/2 Running 0 28m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/bookkeeper-operator-controller-manager-metrics-service ClusterIP 10.100.115.243 <none> 8443/TCP 28m
service/pulsar-operator-controller-manager-metrics-service ClusterIP 10.100.69.43 <none> 8443/TCP 28m
service/zookeeper-operator-controller-manager-metrics-service ClusterIP 10.100.162.192 <none> 8443/TCP 28m

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/bookkeeper-operator-controller-manager 1/1 1 1 28m
deployment.apps/pulsar-operator-controller-manager 1/1 1 1 28m
deployment.apps/zookeeper-operator-controller-manager 1/1 1 1 28m

NAME DESIRED CURRENT READY AGE
replicaset.apps/bookkeeper-operator-controller-manager-587d574b57 1 1 1 28m
replicaset.apps/pulsar-operator-controller-manager-69c6ffbbc4 1 1 1 28m
replicaset.apps/zookeeper-operator-controller-manager-849bf45499 1 1 1 28m

5. Check the API resources and CRDs added to Kubernetes. I will use them to create a Pulsar cluster later.

kubectl api-resources | grep streamnative

bookkeeperclusters bk bookkeeper.streamnative.io/v1alpha1 true BookKeeperCluster
pulsarbrokers pb,broker pulsar.streamnative.io/v1alpha1 true PulsarBroker
pulsarproxies pp,proxy pulsar.streamnative.io/v1alpha1 true PulsarProxy
zookeeperclusters zk zookeeper.streamnative.io/v1alpha1 true ZooKeeperCluster
kubectl get crds | grep streamnative

bookkeeperclusters.bookkeeper.streamnative.io 2023-03-07T08:40:47Z
pulsarbrokers.pulsar.streamnative.io 2023-03-07T08:40:45Z
pulsarproxies.pulsar.streamnative.io 2023-03-07T08:40:45Z
zookeeperclusters.zookeeper.streamnative.io 2023-03-07T08:40:40Z

6. Check the ClusterServiceVersion (CSV) of Pulsar Operators.

kubectl get csv -n operators

NAME DISPLAY VERSION REPLACES PHASE
bookkeeper-operator.v0.14.1 BookKeeper Operator 0.14.1 bookkeeper-operator.v0.12.4 Succeeded
pulsar-operator.v0.14.1 Pulsar Operator 0.14.1 pulsar-operator.v0.12.4 Succeeded
zookeeper-operator.v0.14.1 ZooKeeper Operator 0.14.1 zookeeper-operator.v0.12.4 Succeeded

Installing Pulsar with Pulsar Operators

StreamNative provides a quickstart YAML file that contains the manifests of Pulsar brokers, proxies, BookKeeper, and ZooKeeper. You can use it to quickly deploy Pulsar’s custom resources.

1. Create a namespace called pulsar where Pulsar workloads will be deployed later. This is the default name in the example YAML file. By default, Pulsar Operators watch all Kubernetes namespaces to see if the corresponding Pulsar resources are created, so the namespace you create in this step can be different from the one where Pulsar Operators are deployed.

kubectl create ns pulsar

2. Deploy Pulsar with proxies. Alternatively, you can download the YAML file, customize some parameters, and then apply it.

kubectl apply -f https://raw.githubusercontent.com/streamnative/charts/master/examples/pulsar-operators/proxy.yaml

3. Verify that Pulsar is deployed successfully.

kubectl get pods -n pulsar

NAME READY STATUS RESTARTS AGE
bookies-bk-0 1/1 Running 0 119s
bookies-bk-1 1/1 Running 0 119s
bookies-bk-2 1/1 Running 0 119s
bookies-bk-auto-recovery-0 1/1 Running 0 71s
brokers-broker-0 1/1 Running 0 2m2s
brokers-broker-1 1/1 Running 0 2m2s
proxys-proxy-0 1/1 Running 0 2m56s
proxys-proxy-1 1/1 Running 0 2m56s
zookeepers-zk-0 1/1 Running 0 2m57s
zookeepers-zk-1 1/1 Running 0 2m56s
zookeepers-zk-2 1/1 Running 0 2m56s

A potential problem in this step is that your ZooKeeper/bookie Pods remain in the pending state (You can use kubectl describe to check the Pods). This may result from storage misconfiguration (PVCs remain unbound). As I mentioned above, you need to have a default storage class in your cluster to provide persistence for them. If you are using a managed Kubernetes service like GKE, EKS, or AKS, you don’t need to worry about it as these vendors (at least the popular ones) can provide volumes dynamically. If you want to use your own storage class, you can set it in this file (without proxies) and then apply it in step 2.

The following are PVCs and PVs in my environment for your reference:

kubectl get pvc -n pulsar

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-log-zookeepers-zk-0 Bound pvc-4273b306-ae46-4a98-8375-8a65d5901df8 2Gi RWO gp2 3m39s
data-log-zookeepers-zk-1 Bound pvc-e88a1f3a-ab12-4117-8083-a636e636ae67 2Gi RWO gp2 3m39s
data-log-zookeepers-zk-2 Bound pvc-2914872d-d205-4329-ad66-42ba2f73a8bf 2Gi RWO gp2 3m38s
data-zookeepers-zk-0 Bound pvc-29404789-27e5-4d04-ba03-78a2476be4b6 8Gi RWO gp2 3m39s
data-zookeepers-zk-1 Bound pvc-ed30dee2-7530-4dbf-b80a-1a41a1be2e30 8Gi RWO gp2 3m39s
data-zookeepers-zk-2 Bound pvc-ec744e16-0bfc-4e0e-993c-3bf667510d63 8Gi RWO gp2 3m38s
journal-0-bookies-bk-0 Bound pvc-fed9c822-ea14-41bb-9f83-fc238bef4a99 8Gi RWO gp2 2m41s
journal-0-bookies-bk-1 Bound pvc-f4014519-3134-4c92-91d6-5da6f9e7b150 8Gi RWO gp2 2m41s
journal-0-bookies-bk-2 Bound pvc-fbd276cf-2e21-44db-8b01-12f15d5aedf4 8Gi RWO gp2 2m41s
ledgers-0-bookies-bk-0 Bound pvc-611fa466-1336-474a-9e8c-c4dfd388d3b6 16Gi RWO gp2 2m41s
ledgers-0-bookies-bk-1 Bound pvc-a4dffb05-3830-4c31-9fa8-8f63be6a933c 16Gi RWO gp2 2m41s
ledgers-0-bookies-bk-2 Bound pvc-abc618a0-9d08-44ff-a7ce-e39fba19e197 16Gi RWO gp2 2m41s
kubectl get pv

NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-2914872d-d205-4329-ad66-42ba2f73a8bf 2Gi RWO Delete Bound pulsar/data-log-zookeepers-zk-2 gp2 4m10s
pvc-29404789-27e5-4d04-ba03-78a2476be4b6 8Gi RWO Delete Bound pulsar/data-zookeepers-zk-0 gp2 4m10s
pvc-4273b306-ae46-4a98-8375-8a65d5901df8 2Gi RWO Delete Bound pulsar/data-log-zookeepers-zk-0 gp2 4m10s
pvc-611fa466-1336-474a-9e8c-c4dfd388d3b6 16Gi RWO Delete Bound pulsar/ledgers-0-bookies-bk-0 gp2 3m13s
pvc-a4dffb05-3830-4c31-9fa8-8f63be6a933c 16Gi RWO Delete Bound pulsar/ledgers-0-bookies-bk-1 gp2 3m13s
pvc-abc618a0-9d08-44ff-a7ce-e39fba19e197 16Gi RWO Delete Bound pulsar/ledgers-0-bookies-bk-2 gp2 3m12s
pvc-e88a1f3a-ab12-4117-8083-a636e636ae67 2Gi RWO Delete Bound pulsar/data-log-zookeepers-zk-1 gp2 4m10s
pvc-ec744e16-0bfc-4e0e-993c-3bf667510d63 8Gi RWO Delete Bound pulsar/data-zookeepers-zk-2 gp2 4m10s
pvc-ed30dee2-7530-4dbf-b80a-1a41a1be2e30 8Gi RWO Delete Bound pulsar/data-zookeepers-zk-1 gp2 4m10s
pvc-f4014519-3134-4c92-91d6-5da6f9e7b150 8Gi RWO Delete Bound pulsar/journal-0-bookies-bk-1 gp2 3m13s
pvc-fbd276cf-2e21-44db-8b01-12f15d5aedf4 8Gi RWO Delete Bound pulsar/journal-0-bookies-bk-2 gp2 3m12s
pvc-fed9c822-ea14-41bb-9f83-fc238bef4a99 8Gi RWO Delete Bound pulsar/journal-0-bookies-bk-0 gp2 3m13s

Now you should be ready to use the Pulsar cluster to handle client requests.

Conclusion

There are many ways to install Pulsar on Kubernetes. As the Pulsar ecosystem continues to expand, we may have another deployment method or even a new technology better than Kubernetes to run Pulsar. In the ever-changing IT world, there is no panacea and the best we can do is to find a solution that best fits our needs.

--

--