Avoid duplicate event processing in Event Streaming system

Soumyadev Bhattacharyya
4 min readMay 19, 2024

--

While building Event driven microservice based application, it is important to build system to avoid duplicate event handling by baking in idempotency. At the same time, we need to ensure that the events are not lost in case of service restarts.

Idempotency is a fundamental principle in computer science that ensures an operation can be performed multiple times without changing the result. This is crucial for handling duplicate events and maintaining data integrity in distributed systems.

Understanding Idempotency Principles

  1. Idempotent operations produce the same output regardless of how many times they are executed, ensuring reliable and consistent behavior.
  2. Idempotency allows systems to handle duplicate inputs or events without introducing errors or unexpected side effects.
  3. Idempotent operations help maintain the desired state of a system, even in the face of potential data inconsistencies or network failures.

Idempotency Implementation Strategies

1. Deduplication

Maintain a cache or registry of processed events to identify and discard duplicates.

2. Unique Identifiers

Use unique identifiers, such as request IDs or timestamps, to detect and handle duplicate operations.

3. Idempotent Operations

Design operations that produce the same result, regardless of how many times they are executed.

Idempotency Implementation in Distributed Systems

Here we will discuss about different approach for implementing Idempotency and later on talk about the shortcoming for this.

Option #1

In the below diagram, events are consumed by Consumer services either single event or in batches. Once events are read from Event Hub (EH), they are water-marked into a persistent DB, which in this case is a Redis cache. Redis Cache serves well, as the data is not needed for long period.

Once the watermarking is done, then business logic gets executed. In this case, the business logic requires writing to 2 data stores, one is into a Database associated with this service, while another is to publish an event to another EH, to be consumed by subsequent service.

Once the business logic is executed, then EH checkpointing is completed by writing to a blob storage.

This works well but have issues.

  1. Consider the consumer service, executing the event, gets restarted before EH checkpointing is complete. The consumer service will pick up the same event again, but as the watermarking is already done in the Redis Cache, it will assume the event is already processed. Result of this, would be inconsistent data across data stores, along with incomplete processing of the event, sometime skipping processing.
  2. This implementation doesn’t honor transactional consistency across heterogeneous data stores.
  3. Query on Redis Cache is not optimal, rather you have to use a key based identification if that exists.

Option #2

In the below diagram, events are consumed by consumer service and instead of writing the consumed event directly into the watermarked datastore, it only query and checks if the event with sequence No and offset for that Consumer Group and EH combination is present on the watermark DB. If not present, then process business logic (BL), else skip to EH Checkpointing on blob storage.

This process avoids, skipping of event processing in case, the consumer service restarts in between. But this comes with a challenge of how to ensure duplicate events are not published into subsequent services.

To solve that, need for implementation of outbox pattern is required.

What is outbox pattern?

The Outbox pattern is a design pattern used to ensure reliable communication and data consistency in distributed systems, particularly in microservices architectures. It helps to handle the problem of ensuring that events are reliably published even if a service or its components fail.

Here, the outbox pattern is implemented by storing the stages to publish the data to external sources using a separate Cosmos DB collection. Before publishing to Event Hub during processing of BL by consumer service, it must check for existence of the document in the Cosmos DB collection.

When the processing of business logic completes, then watermark the event processed by writing to a separate watermark Cosmos DB collection, before EH Checkpointing.

The downside of this approach, is that

  1. Executing the business logic for duplicate events getting picked up. So the compute is getting utilized.
  2. Introducing additional hops in the form of CDC and Functions to trigger to publish the message to new EH. So additional monitoring and alerting has to be in place.

The positive side of this approach are

  1. Even if the duplicate message is picked up by consumer service, it will not write to its Operational DB, without checking for its existence. This ensures we don’t have duplicate data in the database.
  2. It will not publish duplicate message to subsequent services, without checking the watermarked DB for that stage, solving problem of duplicate event publishing from source.

Conclusion and Key Takeaways

Reliable Systems

Implementing idempotency helps build robust and reliable systems that can handle failures and network issues.

Consistent Data

Idempotent operations ensure data integrity and consistency, even in the face of duplicate events or concurrent modifications.

Improved Scalability

Idempotency enables systems to scale more effectively, as they can handle retries and redeliveries without introducing errors.

--

--

Soumyadev Bhattacharyya

Dreamer, Solution Architect, Love Sharing Knowledge, Learner