Multi-Cluster Kafka with Strimzi.io

Tobias Andersen
6 min readApr 25, 2024

--

In our modern era of digital transformation, where data reigns supreme and real-time insights drive decision-making, the need for efficient data processing has never been more critical. Apache Kafka stands tall as a leading solution for distributed event streaming, empowering organizations to handle massive volumes of data with agility and scalability. However, as data demands continue to surge, a single Kafka cluster may not suffice. Enter the realm of multi-cluster Kafka environments, where redundancy, scalability, and fault tolerance are paramount. Fortunately, with the advent of Kubernetes and the Strimzi Kafka Operator, orchestrating and managing such complex setups has become remarkably streamlined. In this article, we embark on a journey to explore the intricacies of architecting a robust multi-cluster Kafka setup to unlock unparalleled efficiency and scalability for our Heimdall platform.

Setting up our Kafka Clusters

The heart of our architecture lies in the definition of two Kafka clusters, namely kafka-cluster-one and kafka-cluster-two. Each cluster is meticulously configured with the desired Kafka version, replication factor, listeners, and ZooKeeper ensemble. With Strimzi Kafka Operator, we can easily scale our Kafka clusters up or down based on demand, ensuring seamless handling of data streams with minimal downtime or performance bottlenecks using something like this:

# Define Kafka Clusters
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: kafka-cluster-one
annotations:
strimzi.io/node-pools: enabled
spec:
kafka:
version: 3.7.0
replicas: 1
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
config:
offsets.topic.replication.factor: 3
transaction.state.log.replication.factor: 3
transaction.state.log.min.isr: 2
default.replication.factor: 3
min.insync.replicas: 2
inter.broker.protocol.version: "3.7"
zookeeper:
replicas: 3
storage:
type: persistent-claim
size: 100Gi
deleteClaim: false
entityOperator:
topicOperator: {}
userOperator: {}
---
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: kafka-cluster-two
annotations:
strimzi.io/node-pools: enabled
spec:
kafka:
version: 3.7.0
replicas: 1
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
config:
offsets.topic.replication.factor: 3
transaction.state.log.replication.factor: 3
transaction.state.log.min.isr: 2
default.replication.factor: 3
min.insync.replicas: 2
inter.broker.protocol.version: "3.7"
zookeeper:
replicas: 3
storage:
type: persistent-claim
size: 100Gi
deleteClaim: false
entityOperator:
topicOperator: {}
userOperator: {}

Breaking down the YAML we can extrapolate a few key details:

  • The metadata section specifies the name of a Kafka cluster.
  • annotations provide additional configurability, such as enabling node pools, e.g. a form of “feature flag”.
  • Under spec, kafka defines the Kafka version, number of replicas, and listener configurations (e.g., plain and TLS).
  • config contains Kafka configuration settings like replication factors for offsets and transaction state logs.
  • zookeeper specifies the ZooKeeper configuration, including the number of replicas and storage settings.
  • entityOperator configures operators for managing topics and users.

Replication with Kafka MirrorMaker2

To ensure data consistency and fault tolerance across our clusters, we employ the Kafka MirrorMaker2. This vital component facilitates data replication from kafka-cluster-one to kafka-cluster-two, enabling seamless data transfer between geographically distributed Kafka environments. With mirror maker configured within our setup, we can guarantee high availability and disaster recovery capabilities, crucial for mission-critical applications with almost no effort:

# Define Kafka Mirror Maker for replication between clusters
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaMirrorMaker2
metadata:
name: kafka-mirror-maker2
spec:
version: 3.7.0
replicas: 1
template:
connectContainer:
env:
- name: OTEL_SERVICE_NAME
value: kafka-mirror-maker2
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://localhost:4317"
tracing:
type: opentelemetry
connectCluster: "kafka-cluster-two"
clusters:
- alias: "kafka-cluster-one"
bootstrapServers: kafka-cluster-one-kafka-bootstrap:9092
- alias: "kafka-cluster-two"
bootstrapServers: kafka-cluster-two-kafka-bootstrap:9092
config:
# -1 means it will use the default replication factor configured in the broker
config.storage.replication.factor: -1
offset.storage.replication.factor: -1
status.storage.replication.factor: -1
mirrors:
- sourceCluster: "kafka-cluster-one"
targetCluster: "kafka-cluster-two"
sourceConnector:
tasksMax: 1
config:
# -1 means it will use the default replication factor configured in the broker
replication.factor: -1
offset-syncs.topic.replication.factor: -1
sync.topic.acls.enabled: "false"
refresh.topics.interval.seconds: 600
checkpointConnector:
tasksMax: 1
config:
# -1 means it will use the default replication factor configured in the broker
checkpoints.topic.replication.factor: -1
sync.group.offsets.enabled: "false"
refresh.groups.interval.seconds: 600
topicsPattern: ".*"
groupsPattern: ".*"

Breaking down the YAML we can extrapolate a few key details:

  • template enables us to configures the connect container with addtional environment variables for OpenTelemetry tracing.
  • tracing specifies the type of tracing, which is OpenTelemetry in this case.
  • connectCluster identifies the Kafka cluster to which the mirror maker connects.
  • clusters lists the clusters involved in replication, with their aliases and bootstrap server addresses. Additional configurations like storage replication factors can also be specified.
  • mirrors defines the replication settings, such as the source and target clusters, connector configurations, topics and groups patterns, and replication factors for topics like offset-syncs and checkpoints.

Using custom Node Pools

Supporting our Kafka clusters are the defined node pools, kafka-node-pool-one and kafka-node-pool-two. These node pools house the Kafka brokers responsible for processing and storing data. By leveraging Kubernetes built-in capabilities, we ensure fault tolerance and horizontal scalability, allowing our Kafka clusters to handle varying workloads with ease. Each node pool is provisioned with persistent storage, ensuring data durability and resilience against node failures:

# Define Kafka Node Pools
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaNodePool
metadata:
name: kafka-node-pool-one
labels:
strimzi.io/cluster: kafka-cluster-one
spec:
replicas: 3
roles:
- broker
storage:
type: jbod
volumes:
- id: 0
type: persistent-claim
size: 2Gi
deleteClaim: false
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaNodePool
metadata:
name: kafka-node-pool-two
labels:
strimzi.io/cluster: kafka-cluster-two
spec:
replicas: 3
roles:
- broker
storage:
type: jbod
volumes:
- id: 0
type: persistent-claim
size: 2Gi
deleteClaim: false

Breaking down the YAML we can extrapolate a few key details:

  • labels are used to associate our node pool with its corresponding cluster.
  • roles specifies the role of the node pool, which is "broker" indicating that these nodes will serve as Kafka brokers.
  • storage configures the storage settings for a given node pool.
  • type specifies the type of storage, which is "jbod" (just a bunch of disks).

Topics and Replication

Finally, let’s briefly discuss the replication of topics between kafka-cluster-one and kafka-cluster-two. Through mirror maker, topics created in kafka-cluster-one, such as kafka-topic-test-one and kafka-topic-test-two, are automatically replicated to kafka-cluster-two. This seamless replication mechanism ensures that data produced in one cluster is consistently available in the other, enabling cross-cluster data analysis and processing. With topics synchronized between clusters, applications can seamlessly consume data from either cluster, enhancing flexibility and scalability.

# Define Kafka Topics
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
name: kafka-topic-test-one
labels:
strimzi.io/cluster: kafka-cluster-one
spec:
partitions: 1
replicas: 1
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
name: kafka-topic-test-two
labels:
strimzi.io/cluster: kafka-cluster-one
spec:
partitions: 1
replicas: 2

Breaking down the YAML we can extrapolate a few key details:

  • strimzi.io/cluster label is used to indicating the Kafka cluster a given topic belongs to.
  • partitions specifies the number of partitions for the topic, which is set to 1.
  • replicas specifies the replication factor for the topic, indicating how many copies of each partition should be maintained across brokers. In this case, it's set to 1, meaning each partition will have one replica.

Conclusion

In conclusion, the adoption of Strimzi Kafka Operator alongside Kubernetes presents a transformative approach to architecting multi-cluster Kafka environments, revolutionizing the landscape of data processing and analytics. By harnessing the synergies between Kubernetes orchestration capabilities and Strimzi’s declarative management of Kafka resources, organizations can transcend traditional constraints and realize the full potential of event streaming architectures. The seamless scalability, fault tolerance, and resilience afforded by multi-cluster setups not only address the immediate challenges of data volume and velocity but also future-proof businesses against evolving data requirements and technological advancements. As industries continue to navigate the complexities of digital transformation, investing in robust infrastructure solutions like multi-cluster Kafka environments becomes not just a strategic move but a foundational pillar for sustained innovation and competitive advantage.

--

--

No responses yet