I have been working with distributed systems like Apache Hadoop and Apache Kafka for many years. It was never easy to provision and operate such a distributed system on premise, but Kubernetes may have changed this a lot.
Once familiar with Kubernetes, I found that Kafka provisioning and management become much easier. It is like a single command to deploy a Kafka cluster, add new brokers, or perform a rolling change to configurations. Kafka is a stateful service, and this makes managing Kafka on Kubernetes more complex than it is for stateless microservices. The biggest challenge comes when configuring persistent volumes, which needs to deliver high throughput at consistent low latency.
At the heart of provisioning and managing Kafka with Kubernetes is a Kafka Operator. There are several available operators, including the one from Confluent, but I found the Kafka Operator from Banzai Cloud the simplest. It was easy to install and I like their approach of leveraging existing operators for Kafka dependencies: Zookeeper, (optional) cert manager and Prometheus. It supports automatic provisioning, management, scaling and operations of Kafka clusters deployed to Kubernetes.
Installing the Banzai Cloud Kafka Operator is straightforward by following its README. The only change I made is instead of using the default namespace, I deployed Prometheus and Grafana (for monitoring) in
monitoring namespace following the kube-prometheus repository.
Provisioning Kafka with Kubernetes
With the Kafka Operator, provisioning a Kafka cluster on Kubernetes is as easy as a single command:
kubectl create -n kafka -f mycluster.yaml
An example of the cluster configuration file is available at my Github repo. The above command installs a multi-broker Kafka cluster, along with the operator and Cruise Control (a GUI Kafka management tool from LinkedIn) in the
kafka name space.
This is what it looks like after deploying.
With this architecture, adding a new broker to the cluster is as simple as modifying the cluster spec file and applying the changes.
vi mycluster.yaml# add a new Kafka broker broker-3
- id: 3
Apply the changes with the following command. The new broker will be deployed and shown up on Cruise Control UI in a few seconds.
kubectl apply -n Kafka -f mycluster.yaml
Rolling updates also becomes easier. I add
mycluster.yaml and run the
kubectl apply again. I can see my kafka pods getting rolling updated one by one from Kubernetes dashboard UI.
To test resilience and auto-healing, I simulate a pod failure.
kubectl delete -n kafka pods <kafka_pod_name> --grace-period=0 --force
After a few seconds, I see a new broker pod has been deployed to replace the deleted one. The new broker pod has the same broker id as the old one, automatically mounts the persistent volume (via PSO, explain later) where Kafka data is stored. So it completely replaces the deleted/failed one.
Thanks to the combination of Kubernetes, Kafka Operator and Kafka itself, there is little to no impact to my Kafka producers and consumers during the operation.
Kafka is a stateful service, which means a Persistent Volume (PV) is required to prevent data loss from pod failure (in principal, Kafka tolerates N-1 pod failure with replication factor N, but using ephemeral storage for Kafka data is not recommended as multiple pods may run on the same host).
Kafka with Kubernetes not only needs persistent storage volume, it also requires the volume to be fast to deliver high throughput at consistent low latency. I use Pure Storage FlashBlade as my PV storage. I then use Pure Service Orchestrator (PSO) to automate provisioning and recovering PVs. So my architecture looks like this:
Every time a new broker is added to the cluster, the pod issues a Persistent Volume Claim (PVC) to request a PV. The PVC is handled by PSO. It creates a NFS volume on FlashBlade and mounts the volume to that pod. As aforementioned, if a pod fails, PSO also handles re-mounting the PV to the replaced pod. This makes pod failure minimum impact to the Kafka cluster.
I put the following into my cluster spec for it to use FlashBlade via PSO:
- mountPath: "/kafka-logs"
Other benefits of using a fast shared storage for Kafka includes:
- Scale compute and storage independently.
- Reduce storage failure points (which is painful in a large traditional direct-attached storage (DAS) Kafka cluster) such as disk failure, unbalanced disk usage and RAID rebuild.
Because my Kafka data is now protected by FlashBlade, I can also reduce my Kafka replication factor from 3 to 2.
How is the performance of running Kafka in containers with Kubernetes using remote shared storage? I did some simple tests to measure the baseline performance of this architecture.
I start from 3 broker pods with default Kafka configuration. I allocate each broker pod 2~4 vCPU and 4~8GB RAM in my container resource spec. I create a Kafka topic with 8 partitions and replication factor of 2. I then launch 3, 5 and 7 small client pods, each of which runs the following producer-perf test command using default producer configuration:
/opt/kafka/bin/kafka-producer-perf-test.sh --topic perf-rep2 --num-records 10000000 --record-size 100 --throughput -1 --producer.config /data/kube-kafka/conf/producer.properties
This is an output from one of the test clients. I got an average of 47.5K records/s with 100 bytes per record, 45MB/s throughput and 106ms latency.
10000000 records sent, 475194.829880 records/sec (45.32 MB/sec), 106.33 ms avg latency, 611.00 ms max latency, 174 ms 50th, 526 ms 95th, 580 ms 99th, 589 ms 99.9th.
From my storage management UI, I see the traffic from those 3, 5 and 7 testing producers.
I can also see the peak throughput at 1.05GB/s for 5 producers, 1.37GB/s for 7 producers. Latency is around 2~4ms. My storage latency is much lower than that 106ms reported by my testing producer. This indicates most of the end-to-end latency is between the client and brokers, not storage I/O. Therefore, if I want to reduce latency, I will look at metrics between my compute nodes and Kafka configurations, rather than I/O.
I increase the number of brokers to 5 (by a single command!), peak throughput jumps to 1.5GB/s, latency keeps lower than 3.5ms. I also observed performance increase by giving the broker pods more resources.
To test end-to-end latency, I run the below script:
Avg latency: 4.8578 msPercentiles: 50th = 4, 99th = 11, 99.9th = 30
The testing topic perf-e2e-latency was created with 4 partitions and RF=2. Producer acks was set to all, the highest durability setting. I get an average of less than 5ms latency.
From these test results, I see good signs that Kafka’s performance and scalability is preserved in my Kubernetes environment.
Kafka is the key building block for streaming and real-time process. With Kubernetes, Kafka provisioning becomes much easier. Fast shared storage is critical for data protection and performance of running Kafka on Kubernetes with persistent volumes.