Kafka MirrorMaker 2.0

KNN NN

Published in

CJ Express Tech (TILDI)

6 min readNov 24, 2022

Why MirrorMaker 2.0, why does it exist ?

First of all, it was designed to address the problems with legacy version or MirrorMaker 1.0.

Second of all, it was to take advantage to connect ecosystem. Obviously Kafka connect is a big part of ecosystem. People is wondering why we have to run additional infrastructure just for MirrorMaker 1.0.

Last but not least, we want to enable replication use cases. Specifically sail over and disaster recovery.

Let’s take a look pain point on MirrorMaker 1.0

Lack of consumer group offsets mirroring — MirrorMaker 1.0 will only replicate data from source topic into destination topic. With data replication in Kafka is just half of the story. You also have offsets associate with those records and consumer that are progressing though a stream and committing a records.
No offsets translation — if we want to sail over our workloads to another cluster. Just knowing offset it doesn’t help because the offset don’t interact between two cluster. We need something to translate between offsets.
No Timestamp-based recovery — Without that replace MirrorMaker1.0, we’re basically need to leverage timestamp to do sail over. For example, If you have one cluster and for some reason it burn to the ground and want to migrate your workload to a new cluster. The only thing you really do is rewind a previous point in of time and says, my disaster happened at 12 AM, and we gonna rewind to 11 AM and start from there
No centralized ‘control plane’ — There is no centralize control plane where we can monitor or configure or control. We need to configure separately.
No high level metric — There is no consumer/producer metric and metrics of replication in cluster level metrics
Unable to keep topic synchronized — as we said, MirrorMaker 1.0 replicates data but not other thing associate with the topic like configuration, partition and ACL an so on.

How MirrorMake 2.0 hit these problems above?

MirrorMake 2.0, it’s actually built an internal concept of offset translation and that’s used for consumer checkpoint which as the name implies, it capture all of the state from consumer group from one cluster and checkpoint with another cluster.
MirrorMake 2.0 provides high level driver manage replicate between many cluster — manage all the internal client for you
There’s also the ability to have a central configuration file like a single high level of configuration file actually define all of your replication flows at you organization. So you have one place to define.
MirrorMake 2.0, it have ability to monitor for configuration changes, new partition and actively created new partitions.

credit : https://turkogluc.com/content/images/2020/09/Screenshot-2020-09-12-at-15.35.45.png

Overview of Kafka MirrorMaker 2.0

MirrorMaker 2.0 is build on the Connect ecosystem. So it bring a bunch of connectors along with it which you can use. There is 3 connector type

Mirror Source Connector,
Mirror CheckPoint Connector
Mirror Heartbeat Connector

Mirror Source Connector — responsible for topics, data replication based on a replication policy, and other synchronization tasks such as ACL rules, renaming topics. As I said, it replicates record from a source topic into a remote topic like picture below

So you can see the topic that has been replicated obviously from one cluster to another cluster. While records are going through MirrorSourceConnector, the connector actually monitor the offsets that there’re flying by, if an offset 100 on one cluster match up with offsets 100 in the other cluster. That’s easy example. So when offsets get out of sync it 100 over here and 100 over there for one minute. But then the next minute it bounces by hundreds of records or something. It will emit offset sync from upstream source cluster.

2. Mirror CheckPoint Connector — this connector will emit Consumers offset checkpoint. Actually it consume those offsets sync I’ve described and combine them with consumer group to consumer checkpoint, checkpoint are very similar to consumer offsets but include translation information as well. Encode that consumer group that enable sail over. For example if you hav application in one cluster, you can actually pick up application move it to new cluster and resume from latest checkpoint.

3. Mirror Heartbeat Connector — Send heartbeat to remote clusters and it ‘s useful for monitoring also enable client to discover replication topologyFor actually discovery replication flow and you can actually look at any one cluster and topology just by virtue of looking at heartbeat. What data are coming in by using this connector.

Example Deployment

https://strimzi.io/blog/2020/03/30/introducing-mirrormaker2/

I will give you some example how to configure MirrorMaker 2.0 to replicate a topic between 2 cluster.

Create Kafka cluster, first I create 2 Kafka cluster in GKE in different namespace by adapting this yaml file from this article.

cat <<EOF | kubectl create -n kafka-cluster-1/2 -f -
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: kafka
spec:
  kafka:
    version: 3.3.1
    replicas: 3
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
        authentication:
          type: tls
      - name: external
        port: 9094
        type: nodeport
        tls: false
    storage:
      type: ephemeral
    config:
      offsets.topic.replication.factor: 2
      transaction.state.log.replication.factor: 2
      transaction.state.log.min.isr: 2
      default.replication.factor: 2
      min.insync.replicas: 2
      inter.broker.protocol.version: "3.3"
# The entry for Zookeeper is required even in Kraft mode. However, they are ignore by the Operator.
  zookeeper:
    replicas: 1
    storage:
      type: ephemeral
# The entity operator will allow managing of the Topics and Users in the Kafka by the Operator.
  entityOperator:
# If Kraft mode is enabled for operator, remove entry for `topicOperator`. Ref: https://strimzi.io/docs/operators/latest/configuring.html#type-EntityTopicOperatorSpec-reference 
    topicOperator: {}
    userOperator: {}
EOF

2. Creating Topic in Kafka Cluster 1 — create a topic name test1 in namespace kafka-cluster-1, which there’s 1 partitions.

cat <<EOF | kubectl create -n kafka-cluster-1 -f -
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: tweets
  labels:
    strimzi.io/cluster: kafka
spec:
  partitions: 2
  replicas: 1
  config:
    retention.ms: 900000
    segment.bytes: 1073741824
EOF

3. Test Publish Message to topic — exec to the Kafka bootstrap pod, inside the pod they will have example for test creating topic, consumer and produder. We want to check on producer, run this command

./bin/kafka-console-producer.sh - broker-list <IP Kafka Service>:31363 - topic tweets

4. Repeat the same thing in Kafka cluster-2 — I do the same step to create kafka-cluster-2but no need to create a topic this time because we want to replicate topic by using MirrorMaker 2.0

5. Create a MirrorMaker 2.0 in remote cluster (kafka-cluster-2) — I apply this yaml file to create KafkaMirrorMaker2 to connect between kafka-cluster-1 and kafka-cluster-2 so now a target cluster is able to sync topic in source cluster. Note be careful with the version of kafka.strimzi.io

apiVersion: kafka.strimzi.io/v1alpha1
kind: KafkaMirrorMaker2
metadata:
  name: my-mirror-maker2
spec:
  version: 2.4.0
  connectCluster: "kafka-cluster-2"
  clusters:
  - alias: "kafka-cluster-1"
    bootstrapServers: kafka-cluster-1-kafka-bootstrap:9092
  - alias: "kafka-cluster-2"
    bootstrapServers: kafka-cluster-2-kafka-bootstrap:9092
  mirrors:
  - sourceCluster: "kafka-cluster-1"
    targetCluster: "kafka-cluster-2"
    sourceConnector: {}

4. Test access source cluster topic from target cluster — now it’s time to test access source topic by exec to the Kafka bootstrap pod and use this command to consume data from source topic. If it work, you will see all the same data that source topic get.

kafka-console-consumer --bootstrap-server broker1B:29093 --topic kafka-cluster-1.topic1 --from-beginning

Conclusion
That’s all. I’ve shown you how to setup MirrorMaker 2.0 in a dedicated instance and there is bunch of configure that you can play around. I’ve just started exploring Kafka not a very long ago. So if you have any suggestion, or corrections are welcome.

Ref.

https://strimzi.io/blog/2020/03/30/introducing-mirrormaker2/

https://strimzi.io/blog/2021/11/22/migrating-kafka-with-mirror-maker2/

KIP-382: MirrorMaker 2.0

Current state: Accepted Please keep the discussion on the mailing list rather than commenting on the wiki (wiki…

cwiki.apache.org

Kafka MirrorMaker 2.0

Let’s take a look pain point on MirrorMaker 1.0

How MirrorMake 2.0 hit these problems above?

Example Deployment

KIP-382: MirrorMaker 2.0

Current state: Accepted Please keep the discussion on the mailing list rather than commenting on the wiki (wiki…

Getting up to Speed with MirrorMaker 2 - Confluent

"Kafka Streams, Apache Kafka's stream processing library, allows developers to build sophisticated stateful stream…

Written by KNN NN