Apache Pulsar-Geo-Replication & Hybrid Deployment Model to Achieve Synchronous Replication

Karthikeyan Palanivelu
Capital One Tech
Published in
7 min readMay 2, 2019
view of a star exploding in space in red, yellow, and blue

Part 3 of a series on Apache Pulsar is about Geo-replication. If you ended up here directly, please read about the features in Apache Pulsar.

As we’ve covered in a previous article, Apache Pulsar is a new open source distributed pub-sub messaging platform created by Yahoo and now a part of the Apache Software Foundation. Pulsar provides various out-of-the-box features and we will evaluate/test the Geo-Replication feature in this article.

Apache Pulsar supports Geo-Replication out of the box, including the n-mesh replication patterns:

  • Asynchronous Replication
  • Synchronous Replication

Let’s discuss in detail about implementation, advantages, disadvantages and how to choose the best fit.

Asynchronous Replication

Asynchronous replication (active-active) model using blue/yellow/green rectangles, red cylinders, and purple arrows
Asynchronous Replication (Active-Active)

Asynchronous replication involves data published to a topic that gets replicated across configured clusters located in different regions asynchronously. Asynchronous replication can be configured at namespace level for which the tenant has access. Producers are acknowledged as soon as a message is published/persisted locally to the cluster. The broker then replicates the data to the configured replication cluster. When the data is replicated, it preserves the order but not the cursor position.

Note: Cursor Position is preserved and maintained within the local cluster.

Asynchronous Replication (Failover)

Asynchronous replication (failover) model using blue/yellow/green rectangles, red cylinders, and purple arrows
Asynchronous Replication (Failover)

Configuring clusters is similar to the above pattern except we have one consumer for the cluster. Subscriptions are local to the cluster.

When the US-East cluster goes down, producer and consumer can be moved to the US-West cluster by configuring to retain the messages even if there is no active subscription. Data is not lost in this event as replication happens from US-East to US-West. When we start the consumer on US-West, we need to reset the cursor to replay the message from the time of failure to start processing.

The main disadvantage of this pattern is migrating the producer/consumer manually, reset the cursor to replay the message from the time of failure may result in less percentage of duplicates which is based on the accuracy level of cursor reset.

This deployment/configurable pattern is recommended for:

  • Applications with idempotent data.
  • Applications whose backend data is not replicated and/or maintains a replica in a different region.
  • Applications that can accept duplicate messages till the time of recovery is reached.

Configuring Asynchronous Replication

Assuming we have two clusters in A and B, execute the below commands to let each cluster participate in a quorum of clusters. Run the below command to register cluster B to cluster A as a replication cluster:

./bin/pulsar-admin clusters create \
— url http://<DNS-OF-B>:<PORT> \
— broker-url pulsar://<DNS-OF-B>:<PORT> \
— broker-url-secure pulsar+ssl://<DNS-OF-B>:<PORT> \
— url-secure https://<DNS-OF-B>:<PORT> B

In the above command, B is the Cluster Name configured in the broker.conf. To configure the replication cluster, admin expects all the parameters to be provided even though they are mutually exclusive.

Similarly, register cluster A to B:

./bin/pulsar-admin clusters create \
— url http://<DNS-OF-A>:<PORT> \
— broker-url pulsar://<DNS-OF-A>:<PORT> \
— broker-url-secure pulsar+ssl://<DNS-OF-A>:<PORT> \
— url-secure https://<DNS-OF-A>:<PORT> A

Create tenants in both clusters by configuring permission for both clusters.

./bin/pulsar-admin tenants create my-tenant \
— admin-roles admin — allowed-clusters A,B

Create namespace in both clusters along with permissions:

./bin/pulsar-admin namespaces create my-tenant/ns1./bin/pulsar-admin namespaces grant-permission my-tenant/ns1 \
— actions produce,consume \
— role ‘admin’

Set cluster configuration at namespace level as below:

./bin/pulsar-admin namespaces set-clusters my-tenant/ns1 \
— clusters A,B

Replication is configured across two clusters. When the application produces messages on, say, my-tenant/ns1/my-topic , data gets replicated across the cluster. The above patterns can also be configured easily on Kubernetes.

Synchronous Replication

Synchronous Replication model using blue/green/yellow rectangles, red cylinders, and purple arrows
Synchronous Replication

Synchronous replication is achieved by Apache BookKeeper. Synchronous replication provides greater flexibility for enterprises that only require one cluster. The main configuration issue here involves enabling the “rackAwarePlacementPolicy” on the broker to persist data in bookies across different data centers hosted across different geographical locations. This feature is supported out-of-the-box by Pulsar.

To configure synchronous replication, the best option is to provision ZooKeeper and Pulsar on bare metal, as based on the instructions at http://pulsar.apache.org/docs/en/deploy-bare-metal-multi-cluster/.

Shortcomings to this approach are:

  • Containerization and the flexibility of using containers.
  • Maintenance of clusters based on organization needs.
  • Cloud governance and agility (Rigid Model).
  • Resource utilization.
  • Not necessarily cost effective.

Synchronous Replication — Hybrid Model

To take full advantage of Pulsar’s support for the Cloud and Kubernetes, and to overcome the complexities mentioned above, I’d like to propose a new deployment model — the Hybrid Model.

In the hybrid model , the provisioning happens both on bare metal and the Kubernetes cluster (either EKS, GKE or a standalone Kubernetes cluster).

Assumptions:

  • ZooKeeper can be installed using the Apache Pulsar binary on bare metal.
  • Containers of Broker and Bookie are available.

This model proposes to provision the components as described below:

  • ZooKeeper on Bare Metal/EC2(AWS)
  • Bookies and Brokers on Kubernetes

To provision ZooKeeper in three regions, for example, we can choose AWS regions US-East, US-West and US-Central with a total of three ZooKeeper nodes to form a cluster or ZooKeeper quorum.

Synchronous Replication — Hybrid model using blue/yellow/green rectangles, red cylinders, and purple arrows
Synchronous Replication — Hybrid

Below are the configurations that are required for achieving hybrid synchronous replication:

  • Configure the cluster metadata of Pulsar on one of the nodes with, say, cluster name as “hybrid”, using initialize-cluster-metadata command.
  • Provision Kubernetes cluster in each of the regions such as US-East, US-West, and US-Central. Deploy Bookies and Brokers to the Kubernetes cluster. Say, for example, three replicas each.
  • Configure ZooKeeper nodes as ZooKeeper servers in the Bookies configuration.
  • Configure ZooKeeper nodes as ZooKeeper servers and configuration store servers in the Broker configurations.
  • Enable rack aware policy.

The above configuration, along with Write Quorum (which should be AckQuorum+1), will allow BookKeeper to replicate the data across different geographical locations/regions.

Main advantages of using this model:

  • Resource sharing to reduce cost compared to bare metal.
  • Behaves as a single logical cluster with as many physical clusters in as many regions.
  • n-Number of clusters can be started from any number of regions.
  • Zero data loss.
  • Resilient Model (Topics) — if a cluster goes down, topics are transferred seamlessly to another cluster.
  • Resilient Model (Messages) — if a cluster goes down, messages are replicated based on write quorum to Bookies on another cluster.
  • Consumers can consume messages from any region/cluster.
  • No duplicate messages produced during failover, unlike asynchronous replication.
  • Preserves the cursor position as this model behaves like a single logical cluster.

Main disadvantages of using this model:

  • Maintenance of two different provisioning architectures — EC2 and Kubernetes. If EKS or GKE are used, then it is an easy model to achieve synchronous replication using the hybrid framework.
  • Latency between ZooKeeper hops based on leader during Topic Creation/Deletion. This does not impact the message lifecycle.

Testing Geo-Replication

Geo-Replication can be tested using the same set of producer and consumers we developed earlier in this article. Modify the below line to reflect different cluster host/port:

Producer.java

String localClusterUrl = “pulsar+ssl://<DNS-OF-A>:<PORT>”;String namespace = “my-tenant/ns1”;String topic = String.format(“persistent://%s/my-topic”, namespace);

Consumer.java

String localClusterUrl = “pulsar+ssl://<DNS-OF-B>:<PORT>”;String namespace = “my-tenant/ns1”;String topic = String.format(“persistent://%s/my-topic”, namespace);

When configuring the replication on clusters, provision appropriate roles at tenant and namespace level. The above example has `admin` for example.

Now when you produce messages on cluster A, the consumer can read it from B. As per the above example, consumers can be configured to listen on both A and B to consume the same set of messages.

Conclusion

Apache Pulsar comes with a lot of features to play with, and has the potential to be a strong player in the streaming platform space for years to come. One strong feature is geo-replication, which comes out of the box with native cloud support. Pulsar provides multiple options to pick from how an organization can decide to host their streaming platform based on use cases. To conclude, if platform supports and ignores duplicates of messages with a small percentage can leverage asynchronous replication. Incase use case demands strict unique message processing then synchronous replication is the way to go. If cloud cost and resource utilization is the major factor, a hybrid approach would be the choice.

Happy Pulsaring!

Related Articles In This Series

Apache Pulsar — One Cluster for the Entire Enterprise Using Multi Tenancy
Apache Pulsar — A Gentle Introduction to Apache’s Newest Pub-Sub Messaging Platform

DISCLOSURE STATEMENT: © 2020 Capital One. Opinions are those of the individual author. Unless noted otherwise in this post, Capital One is not affiliated with, nor endorsed by, any of the companies mentioned. All trademarks and other intellectual property used or displayed are property of their respective owners.

--

--