Strimzi — Deploy Kafka in Kubernetes

Apache Kafka

9 min readApr 3, 2024

A Kafka cluster is a distributed system composed of one or more Kafka brokers working together to handle the storage and transmission of data streams. Kafka is often used as a real-time, fault-tolerant, scalable messaging system for processing large volumes of data. Here’s a breakdown of the key components:

Broker: A Kafka broker is a single instance of the Kafka server. It is responsible for receiving, storing, and serving data to producers and consumers. Each broker in a Kafka cluster is identified by a unique broker ID and typically runs on a separate server or virtual machine.
Cluster: A Kafka cluster consists of multiple brokers working together. Brokers communicate with each other to replicate data, balance load, and coordinate leader elections for partitions. Kafka clusters offer fault tolerance and scalability by distributing data across multiple brokers.
Topic: Data in Kafka is organized into topics, which are logical channels or categories where producers publish messages and consumers subscribe to consume them. Topics are further divided into partitions for scalability and parallel processing.
Partition: Each topic is divided into one or more partitions. Partitions are the unit of parallelism in Kafka and serve as the basic unit of distribution and replication. Messages within a partition are ordered by their offset, and each partition is hosted on one broker and replicated across multiple brokers for fault tolerance.
Replication: Kafka uses replication to ensure fault tolerance and data durability. Each partition has one leader and one or more follower replicas. The leader handles all read and write requests for the partition, while followers replicate the data asynchronously from the leader. If the leader fails, one of the followers is elected as the new leader.
Producers: Producers are applications or systems that publish messages to Kafka topics. They send messages to Kafka brokers, which then store them in the appropriate topic partitions.
Consumers: Consumers are applications or systems that subscribe to Kafka topics and consume messages from them. Consumers read messages from partitions, maintaining their offset to track their progress. Kafka supports both single-consumer and consumer-group-based consumption models.

Strimzi

Strimzi provides a way to run an Apache Kafka cluster on Kubernetes in various deployment configurations.

Apache Kafka components are provided for deployment to Kubernetes with the Strimzi distribution. The Kafka components are generally run as clusters for availability.

A typical deployment incorporating Kafka components might include:

Kafka cluster of broker nodes
ZooKeeper cluster of replicated ZooKeeper instances
Kafka Connect cluster for external data connections
Kafka MirrorMaker cluster to mirror the Kafka cluster in a secondary cluster
Kafka Exporter to extract additional Kafka metrics data for monitoring
Kafka Bridge to make HTTP-based requests to the Kafka cluster
Cruise Control to rebalance topic partitions across broker nodes

Source: https://strimzi.io/docs/operators/latest/overview

Strimzi Operators

Strimzi operators are purpose-built with specialist operational knowledge to effectively manage Kafka on Kubernetes. Each operator performs a distinct function.

Cluster Operator

The Cluster Operator handles the deployment and management of Apache Kafka clusters on Kubernetes. It automates the setup of Kafka brokers, and other Kafka components and resources.

Topic Operator

The Topic Operator manages the creation, configuration, and deletion of topics within Kafka clusters.

User Operator

The User Operator manages Kafka users that require access to Kafka brokers.

When you deploy Strimzi, you first deploy the Cluster Operator. The Cluster Operator is then ready to handle the deployment of Kafka. You can also deploy the Topic Operator and User Operator using the Cluster Operator (recommended) or as standalone operators. You would use a standalone operator with a Kafka cluster that is not managed by the Cluster Operator.

The Topic Operator and User Operator are part of the Entity Operator. The Cluster Operator can deploy one or both operators based on the Entity Operator configuration.

Source: https://strimzi.io/docs/operators/latest/overview#configuration-points_str

Steps to deploy Kafka Strimzi Operator, Kafka Cluster, and its components on Kubernetes

1. Deploy Kafka Strimzi Operator

Refer — https://strimzi.io/downloads/

Using Strimzi Kafka Operator 0.40.0, the latest stable release at the time of writing this post.

This can be done in multiple ways

Using Helm Chart

kubectl create ns kafka
helm repo add strimzi https://strimzi.io/charts/
helm repo update
helm install strimzi strimzi/strimzi-kafka-operator -n kafka

The helm chart strimzi-kafka-operator-helm-3-chart-0.40.0 can also be downloaded from the above mentioned downloads link

Using YAML

Download the file — strimzi-cluster-operator-0.40.0.yaml from the official downloads page mentioned above.

kubectl create namespace kafka
kubectl apply -f strimzi-cluster-operator-0.40.0.yaml -n kafka

Using installation artifacts

Download the tar/zip from the above downloads link and extract the files to STRIMZI_HOME.

Edit the Strimzi installation files to use the namespace the Cluster Operator is going to be installed into. In this post, the Cluster Operator is installed into the namespace kafka.
On Linux, use:

sed -i 's/namespace: .*/namespace: kafka/' install/cluster-operator/*RoleBinding*.yaml

On MacOS, use:

sed -i '' 's/namespace: .*/namespace: kafka/' install/cluster-operator/*RoleBinding*.yaml

cd $STRIMZI_HOME
kubectl create namespace kafka
kubectl create -f install/cluster-operator -n kafka

To verify the deployment

kubectl get deployments -n kafka

2. Deploy Kafka Cluster

Refer — https://strimzi.io/docs/operators/latest/deploying#deploying-kafka-cluster-str

The artifacts come with examples to deploy kafka cluster. Using the kafka-persistent-single.yaml file with modifications shown below. This will deploy a persistent cluster with a single ZooKeeper node and a single Kafka node.

kubectl apply -f examples/kafka/kafka-persistent-single.yaml

kafka-persistent-single.yaml

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: osds-cluster
spec:
  kafka:
    version: 3.7.0
    replicas: 1
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
      - name: external
        port: 9094
        tls: false
        type: nodeport
        configuration:
          brokers:
            - broker: 0
              advertisedHost: localhost
              advertisedPort: 31769
    config:
      offsets.topic.replication.factor: 1
      transaction.state.log.replication.factor: 1
      transaction.state.log.min.isr: 1
      default.replication.factor: 1
      min.insync.replicas: 1
      inter.broker.protocol.version: "3.7"
    storage:
      type: jbod
      volumes:
      - id: 0
        type: persistent-claim
        size: 2Gi
        class: manual
        deleteClaim: false
  zookeeper:
    replicas: 1
    storage:
      type: persistent-claim
      size: 2Gi
      class: manual
      deleteClaim: false
  entityOperator:
    topicOperator: {}
    userOperator: {}

Let’s break down what each section does:

apiVersion: Specifies the API version of the Kafka custom resource. In this case, it's kafka.strimzi.io/v1beta2.
kind: Specifies the type of Kubernetes resource. Here, it's Kafka, indicating that this YAML defines a Kafka cluster.
metadata: Specifies metadata about the Kafka cluster, such as its name (osds-cluster).
spec: Specifies the desired state of the Kafka cluster.
kafka: Configures the Kafka brokers within the cluster.
version: Specifies the version of Kafka to use (in this case, 3.7.0).
replicas: Specifies the number of Kafka broker replicas (in this case, 1).
listeners: Configures the listeners for Kafka clients to connect to the brokers.
name: Name of the listener (e.g., plain, tls, external).
port: Port number for the listener.
type: Type of listener (e.g., internal, nodeport).
tls: Indicates whether TLS encryption is enabled for the listener.
configuration: Additional configuration for the listener, such as advertised host and port.
config: Specifies Kafka broker configuration settings.
storage: Configures storage options for Kafka brokers.
type: Type of storage (e.g., jbod).
volumes: Specifies volume configurations for Kafka brokers.
Other Kafka-related configurations (e.g., replication factors, protocol versions).
zookeeper: Configures the Zookeeper ensemble used by Kafka.
replicas: Specifies the number of Zookeeper replicas.
storage: Configures storage options for Zookeeper.
entityOperator: Configures the entity operators for managing Kafka topics and users.

Notes — Configuration marked in bold in the above file

Listeners

Internal listeners help Kafka communicate within the Kubernetes network. They use special services and pod names for this. In the above config, plain-9092 and tls-9093 are internal listeners.

External listeners allow connections to Kafka from outside Kubernetes, like from a different network. They use specific connection methods, such as load balancers, to manage this access. The listener — external-9094 uses the type nodeport with broker configuration to access Kafka outside kubernetes using localhost:31769

Listeners are just different ways to connect to Kafka, and you can set them up based on your needs.

JBOD storage

Refer — https://strimzi.io/docs/operators/latest/deploying#ref-jbod-storage-str

JBOD storage allows you to configure your Kafka cluster to use multiple disks or volumes. This approach provides increased data storage capacity for Kafka brokers, and can lead to performance improvements. A JBOD configuration is defined by one or more volumes, each of which can be either ephemeral or persistent. The rules and constraints for JBOD volume declarations are the same as those for ephemeral and persistent storage. For example, you cannot decrease the size of a persistent storage volume after it has been provisioned, nor can you change the value of sizeLimit when the type is ephemeral.

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    storage:
      type: jbod
      volumes:
      - id: 0
        type: persistent-claim
        size: 100Gi
        deleteClaim: false
      - id: 1
        type: persistent-claim
        size: 100Gi
        deleteClaim: false
  # ...

PVC resource for JBOD storage

When persistent storage is used to declare JBOD volumes, it creates a PVC with the following name:

data-id-cluster-name-kafka-idx

PVC for the volume used for storing data for the Kafka broker pod idx. The id is the ID of the volume used for storing data for the Kafka broker pod.

Mount path of Kafka log directories

The JBOD volumes are used by Kafka brokers as log directories mounted into the following path:

/var/lib/kafka/data-id/kafka-logidx

Where id is the ID of the volume used for storing data for Kafka broker pod idx. For example /var/lib/kafka/data-0/kafka-log0.

entityOperator — topicOperator, userOperator

Refer — https://strimzi.io/docs/operators/latest/deploying#ref-kafka-entity-operator-str

Configure the Topic Operator and User Operator in your Kafka setup as mentioned in the YAML file. These operators can be customized as mentioned in the above link.

Make sure to have the Cluster Operator deployed before configuring the entityOperator. Follow the provided YAML structure and customize it as needed for your setup.

3. Create and Deploy Kafka Cluster

Use the below YAML to create a kafka topic with 3 partitions and 1 replica.

kafka-topic.yaml

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: osds-topic
  labels:
    strimzi.io/cluster: "osds-cluster"
spec:
  partitions: 3
  replicas: 1

Deploy the topic

kubectl create -f kafka-topic.yaml -n kafka

Check the topic deployment

kubectl get kafkatopic -n kafka

Test the deployment using a Producer and Consumer

Pods running the kafka namespace using above deployments

Have two terminals and run command to access the Kafka Cluster terminal. Each for Producer and Consumer

kubectl exec -n kafka -i -t osds-cluster-kafka-0 -- /bin/bash

Producer:

Consumer:

Access the topic externally using the nodeport configuration

kubectl describe services osds-cluster-kafka-external-bootstrap --context docker-desktop --namespace kafka

Get the NodePort from above command

NodePort: tcp-external 31769/TCP

*The above will be updated each time the service is re-started.

Update in the kafka-persistent-single.yaml and re-deploy

....      
- name: external
        port: 9094
        tls: false
        type: nodeport
        configuration:
          brokers:
            - broker: 0
              advertisedHost: localhost
              advertisedPort: 31769
....

Test from a local kafka binary

bin/kafka-topics.sh --list --bootstrap-server localhost:31769

Access from Offset Explorer tool

In the next post, I will cover additional options such as using the User operator, Kafka Connect, monitoring using Prometheus.