Apache Pulsar: Geo-replication
This is continuation of Apache Pulsar evaluation series from users perspective. If you landed here directly, a gentle introduction will give you a needed start.
Apache Pulsar out of the box supports geo-replication. Out of the box, pulsar supports below n-mesh replication patterns:
- Asynchronous Replication
- Synchronous Replication
Asynchronous Replication
Asynchronous replication as it explains itself, data published to the topic gets replicated across configured clusters located in different regions asynchronously. Asynchronous replication can be configured at namespace level for which the tenant has access to it. Producers are acknowledged as soon as message is published/persisted locally to the cluster. Broker then replicates data to the configured replication cluster. When the data is replicated, it preserves the order but not cursor position. Note: Cursor Position is preserved and maintained within the local cluster.
Following are the types of asynchronous replication patterns:
- Active — Active
- Failover
- Aggregated Replication (Not Covered here) — Please refer to https://streaml.io/blog/geo-replication-patterns-practices/
Active -Active Replication
Above diagram depicts active active clusters across regions, two in this example. Data gets replicated between US-East and US-West given the permissions at namespace level. Producer/Consumer can produce/consume messages locally to the cluster because the subscription is local to the cluster. Or we can have only one consumer to consume messages when producer produces messages across both the cluster.
This patterns produces same set of data across regions. From the user perspective, data that gets processed in east gets replicated to west as well to be replayed. As same set of data gets published across clusters will get processed to backend systems. If the backend system is not replicated then this is the pattern to look into. If the system expects unique set of messages then it is the responsibility of the consuming application to filter before forwarding to backend systems or different implementations.
FailOver Replication
Setting up the clusters is similar to the above pattern except we have one consumer for the cluster. As consumers are local to the cluster with localized subscription.
When the US-East cluster goes down, producer and consumer can be moved to the US-West cluster by configuring to retain the messages even there is no active subscription. Data is not lost in this event as replication happens from US-East to US-West. When we start the consumer on US-West, we need to reset the cursor to replay the message from the time of failure to start processing. Reset of cursor refers to the time of event happened. As the order is maintained application can start processing the messages from where it let off.
Main drawback of this pattern is to migrate the producer/consumer manually, reset the cursor to replay the message which may result in less percentage of duplicates which is based on the accuracy level of cursor reset.
Configuring Asynchronous Replication
Assuming we have two clusters in A and B, execute the below commands to let each cluster to participate in cluster. Run the below command on A cluster to let it know that there is a replication cluster on B:
./bin/pulsar-admin clusters create \
--url http://<DNS-OF-B>:<PORT> \
--broker-url pulsar://<DNS-OF-B>:<PORT> \
--broker-url-secure pulsar+ssl://<DNS-OF-B>:<PORT> \
--url-secure https://<DNS-OF-B>:<PORT> B
In the above command, B
is the Cluster Name configured in the broker.conf. To configure replication cluster, admin expects all the parameters to be provided even though they are mutually exclusive. Similarly, we have to let B
know that there is a replication cluster on A
:
./bin/pulsar-admin clusters create \
--url http://<DNS-OF-A>:<PORT> \
--broker-url pulsar://<DNS-OF-A>:<PORT> \
--broker-url-secure pulsar+ssl://<DNS-OF-A>:<PORT> \
--url-secure https://<DNS-OF-A>:<PORT> A
Run the below commands in both the clusters:
Create tenant in both clusters by allowing permission for both clusters.
./bin/pulsar-admin tenants create my-tenant \
--admin-roles admin --allowed-clusters A,B
Create namespace in both clusters along with permissions:
./bin/pulsar-admin namespaces create my-tenant/ns1
./bin/pulsar-admin namespaces grant-permission my-tenant/ns1 \
--actions produce,consume \
--role 'admin'
Set cluster configuration at namespace level as below:
./bin/pulsar-admin namespaces set-clusters my-tenant/ns1 \
--clusters A,B
We are all set, when the application produce messages on say my-tenant/ns1/my-topic
, data gets replicated across the clusters. Above patterns can be configured pretty easily on kubernetes.
Synchronous Replication
Synchronous replication provides greater flexibility for the enterprise to have one cluster. Main configuration is to enable the rackAwarePlacementPolicy on the broker to persist data in bookies across the regions. This feature is out of the box from pulsar.
Advantages of it is being even if a region goes down, other region actively picks up where it is left off without any duplicates and manual intervention.
Downside of this pattern is the complexity in setting this up and running in bare metal. Above patterns can be launched as containers in kubernetes. Another shortcoming is zookeeper, we need to maintain odd number of clusters like above 3 clusters across 3 regions.
There is a very good documentation at on how to configure the synchronous replication at https://pulsar.apache.org/docs/latest/deployment/instance/.
Testing Geo-replication
Geo-replication can be tested using the same set of producer and consumer we developed earlier in this article. Only modify the below line:
Producer.java
String localClusterUrl = “pulsar+ssl://<DNS-OF-A>:<PORT>”;
String namespace = “my-tenant/ns1”;
String topic = String.format(“persistent://%s/my-topic”, namespace);
Consumer.java
String localClusterUrl = “pulsar+ssl://<DNS-OF-B>:<PORT>”;
String namespace = “my-tenant/ns1”;
String topic = String.format(“persistent://%s/my-topic”, namespace);
When configuring the replication on clusters, ensure that appropriate roles are provisioned at tenant and namespace level. Above example has only admin
as example.
Now when you produce messages on cluster A, consumer can read it from B. As per above example, consumers can be configured to listen on both A and B to consume same set of messages.
Conclusion
Pulsar provides multiple options to pick from how an organization can decide to host their streaming platform by based on the use cases. To conclude, if platform supports and ignores duplicates of messages with small percentage can leverage asynchronous replication. Incase use case demands strict unique message processing than synchronous replication is the way to go.
Finally BIG THANK YOU for Pulsar team helping me in experimenting the features and for using their documentation to represent from the user’s perspective.
Happy Pulsaring!