Multi-Cluster Kafka with Strimzi.io
In our modern era of digital transformation, where data reigns supreme and real-time insights drive decision-making, the need for efficient data processing has never been more critical. Apache Kafka stands tall as a leading solution for distributed event streaming, empowering organizations to handle massive volumes of data with agility and scalability. However, as data demands continue to surge, a single Kafka cluster may not suffice. Enter the realm of multi-cluster Kafka environments, where redundancy, scalability, and fault tolerance are paramount. Fortunately, with the advent of Kubernetes and the Strimzi Kafka Operator, orchestrating and managing such complex setups has become remarkably streamlined. In this article, we embark on a journey to explore the intricacies of architecting a robust multi-cluster Kafka setup to unlock unparalleled efficiency and scalability for our Heimdall platform.
Setting up our Kafka Clusters
The heart of our architecture lies in the definition of two Kafka clusters, namely kafka-cluster-one and kafka-cluster-two. Each cluster is meticulously configured with the desired Kafka version, replication factor, listeners, and ZooKeeper ensemble. With Strimzi Kafka Operator, we can easily scale our Kafka clusters up or down based on demand, ensuring seamless handling of data streams with minimal downtime or performance bottlenecks using something like this:
# Define Kafka Clusters
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: kafka-cluster-one
annotations:
strimzi.io/node-pools: enabled
spec:
kafka:
version: 3.7.0
replicas: 1
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
config:
offsets.topic.replication.factor: 3
transaction.state.log.replication.factor: 3
transaction.state.log.min.isr: 2
default.replication.factor: 3
min.insync.replicas: 2
inter.broker.protocol.version: "3.7"
zookeeper:
replicas: 3
storage:
type: persistent-claim
size: 100Gi
deleteClaim: false
entityOperator:
topicOperator: {}
userOperator: {}
---
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: kafka-cluster-two
annotations:
strimzi.io/node-pools: enabled
spec:
kafka:
version: 3.7.0
replicas: 1
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
config:
offsets.topic.replication.factor: 3
transaction.state.log.replication.factor: 3
transaction.state.log.min.isr: 2
default.replication.factor: 3
min.insync.replicas: 2
inter.broker.protocol.version: "3.7"
zookeeper:
replicas: 3
storage:
type: persistent-claim
size: 100Gi
deleteClaim: false
entityOperator:
topicOperator: {}
userOperator: {}Breaking down the YAML we can extrapolate a few key details:
- The
metadatasection specifies the name of a Kafka cluster. annotationsprovide additional configurability, such as enabling node pools, e.g. a form of “feature flag”.- Under
spec,kafkadefines the Kafka version, number of replicas, and listener configurations (e.g., plain and TLS). configcontains Kafka configuration settings like replication factors for offsets and transaction state logs.zookeeperspecifies the ZooKeeper configuration, including the number of replicas and storage settings.entityOperatorconfigures operators for managing topics and users.
Replication with Kafka MirrorMaker2
To ensure data consistency and fault tolerance across our clusters, we employ the Kafka MirrorMaker2. This vital component facilitates data replication from kafka-cluster-one to kafka-cluster-two, enabling seamless data transfer between geographically distributed Kafka environments. With mirror maker configured within our setup, we can guarantee high availability and disaster recovery capabilities, crucial for mission-critical applications with almost no effort:
# Define Kafka Mirror Maker for replication between clusters
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaMirrorMaker2
metadata:
name: kafka-mirror-maker2
spec:
version: 3.7.0
replicas: 1
template:
connectContainer:
env:
- name: OTEL_SERVICE_NAME
value: kafka-mirror-maker2
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://localhost:4317"
tracing:
type: opentelemetry
connectCluster: "kafka-cluster-two"
clusters:
- alias: "kafka-cluster-one"
bootstrapServers: kafka-cluster-one-kafka-bootstrap:9092
- alias: "kafka-cluster-two"
bootstrapServers: kafka-cluster-two-kafka-bootstrap:9092
config:
# -1 means it will use the default replication factor configured in the broker
config.storage.replication.factor: -1
offset.storage.replication.factor: -1
status.storage.replication.factor: -1
mirrors:
- sourceCluster: "kafka-cluster-one"
targetCluster: "kafka-cluster-two"
sourceConnector:
tasksMax: 1
config:
# -1 means it will use the default replication factor configured in the broker
replication.factor: -1
offset-syncs.topic.replication.factor: -1
sync.topic.acls.enabled: "false"
refresh.topics.interval.seconds: 600
checkpointConnector:
tasksMax: 1
config:
# -1 means it will use the default replication factor configured in the broker
checkpoints.topic.replication.factor: -1
sync.group.offsets.enabled: "false"
refresh.groups.interval.seconds: 600
topicsPattern: ".*"
groupsPattern: ".*"Breaking down the YAML we can extrapolate a few key details:
templateenables us to configures the connect container with addtional environment variables for OpenTelemetry tracing.tracingspecifies the type of tracing, which is OpenTelemetry in this case.connectClusteridentifies the Kafka cluster to which the mirror maker connects.clusterslists the clusters involved in replication, with their aliases and bootstrap server addresses. Additional configurations like storage replication factors can also be specified.mirrorsdefines the replication settings, such as the source and target clusters, connector configurations, topics and groups patterns, and replication factors for topics like offset-syncs and checkpoints.
Using custom Node Pools
Supporting our Kafka clusters are the defined node pools, kafka-node-pool-one and kafka-node-pool-two. These node pools house the Kafka brokers responsible for processing and storing data. By leveraging Kubernetes built-in capabilities, we ensure fault tolerance and horizontal scalability, allowing our Kafka clusters to handle varying workloads with ease. Each node pool is provisioned with persistent storage, ensuring data durability and resilience against node failures:
# Define Kafka Node Pools
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaNodePool
metadata:
name: kafka-node-pool-one
labels:
strimzi.io/cluster: kafka-cluster-one
spec:
replicas: 3
roles:
- broker
storage:
type: jbod
volumes:
- id: 0
type: persistent-claim
size: 2Gi
deleteClaim: false
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaNodePool
metadata:
name: kafka-node-pool-two
labels:
strimzi.io/cluster: kafka-cluster-two
spec:
replicas: 3
roles:
- broker
storage:
type: jbod
volumes:
- id: 0
type: persistent-claim
size: 2Gi
deleteClaim: falseBreaking down the YAML we can extrapolate a few key details:
labelsare used to associate our node pool with its corresponding cluster.rolesspecifies the role of the node pool, which is "broker" indicating that these nodes will serve as Kafka brokers.storageconfigures the storage settings for a given node pool.typespecifies the type of storage, which is "jbod" (just a bunch of disks).
Topics and Replication
Finally, let’s briefly discuss the replication of topics between kafka-cluster-one and kafka-cluster-two. Through mirror maker, topics created in kafka-cluster-one, such as kafka-topic-test-one and kafka-topic-test-two, are automatically replicated to kafka-cluster-two. This seamless replication mechanism ensures that data produced in one cluster is consistently available in the other, enabling cross-cluster data analysis and processing. With topics synchronized between clusters, applications can seamlessly consume data from either cluster, enhancing flexibility and scalability.
# Define Kafka Topics
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
name: kafka-topic-test-one
labels:
strimzi.io/cluster: kafka-cluster-one
spec:
partitions: 1
replicas: 1
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
name: kafka-topic-test-two
labels:
strimzi.io/cluster: kafka-cluster-one
spec:
partitions: 1
replicas: 2Breaking down the YAML we can extrapolate a few key details:
strimzi.io/clusterlabel is used to indicating the Kafka cluster a given topic belongs to.partitionsspecifies the number of partitions for the topic, which is set to 1.replicasspecifies the replication factor for the topic, indicating how many copies of each partition should be maintained across brokers. In this case, it's set to 1, meaning each partition will have one replica.
Conclusion
In conclusion, the adoption of Strimzi Kafka Operator alongside Kubernetes presents a transformative approach to architecting multi-cluster Kafka environments, revolutionizing the landscape of data processing and analytics. By harnessing the synergies between Kubernetes orchestration capabilities and Strimzi’s declarative management of Kafka resources, organizations can transcend traditional constraints and realize the full potential of event streaming architectures. The seamless scalability, fault tolerance, and resilience afforded by multi-cluster setups not only address the immediate challenges of data volume and velocity but also future-proof businesses against evolving data requirements and technological advancements. As industries continue to navigate the complexities of digital transformation, investing in robust infrastructure solutions like multi-cluster Kafka environments becomes not just a strategic move but a foundational pillar for sustained innovation and competitive advantage.
