Reliable streaming from Kafka to SAP using a Sink Gateway

Kris Van Vlaenderen
4 min readSep 6, 2024

--

Let’s talk about a common pain point in Event-Driven Architecture (EDA) that often gets overlooked: dealing with errors when sending events to external systems, like SAP, that might be unavailable or temporarily overloaded. Standard connectors and basic streaming apps often leave you hanging in these situations, so what should your streaming app do when the target system rejects one or more events?

Why a simple DLT isn’t enough

At first, you might be tempted to reach for Dead Letter Queues (DLQs), a common solution in messaging systems. But in Event-Driven Architectures, using DLTs (Dead Letter Topics) often creates more problems than it solves. Here’s why.

In an EDA, event order is often essential for maintaining data consistency. Unfortunately, if events get stuck in a DLT, that order goes out the window, leading to all sorts of headaches. Plus, you’re left with a DLT that constantly needs attention, creating a governance nightmare.

Remember, an event hub in EDA does a lot more than traditional messaging. It stores events for longer periods, enabling replay and complex processing. A simple DLT just can’t keep up.

What you really need is a system that:

  • Guarantees event ordering so you can process events in the right sequence, even with retries.
  • Provides robust retries with features like exponential backoff to handle temporary issues gracefully.

That’s why we at Cymo built the Sink Gateway for one of our clients. I’ll explain how it works using a simplified example of a business process.

Error handling with just a Dead Letter Topic

Imagine the following situation with a CRM system and an SAP integration.

The CRM sends a SalesOrderAssigned event, followed by a SalesOrderApproved event for the same Sales Order ID (1234). Of course, we’ll need to make sure that SAP processes them in that order.

If the SalesOrderAssigned event fails due to an HTTP error, it will land in the DLT, so it doesn’t block all the following events from that partition. However, the SalesOrderApproved event might still be processed by SAP before the assignment information is eventually retried and processed. This creates inconsistent data.

Error handling with the Sink Gateway

The Sink Gateway introduces a retry topic to handle these situations. Here’s how it works:

  1. Failed events are sent to the retry topic, preserving the original order and preventing blocking.
  2. Before processing a new event, the gateway checks for pending retries with the same key. If found, the new event is also sent to the retry topic, so everything stays in sequence.
  3. A scheduled processor attempts to resend events from the retry topic, using an exponential backoff algorithm to give the external system time to recover.
  4. Only after multiple retries, or for truly unresolvable errors, are events moved to a DLT. New events with the same key are also routed to the DLT to maintain consistency.

Under the hood: how it works

Of course, there’s a lot going on behind the scenes to make the Sink Gateway reliable and performant. Here’s a peek under the hood:

  • The gateway uses two key stores: one to keep track of the events that need to be resent, and another to manage the order in which unique keys should be retried.
  • A dedicated processor runs at a configurable interval. It uses the information from the two stores to determine which events are ready for another retry attempt. Events for the same key are always resent in the same order they arrived.
  • When a retry is successful, the event is removed from both stores. If a retry fails, we register another attempt. And if an event exceeds the maximum retry count, it’s sent to the DLT, and all related entries are cleared from the stores.

The beauty of this logic is that it’s not tied to any specific business process. It’s a reusable solution for managing retries and ensuring consistent event delivery across your Event-Driven Architecture.

Configuration and reusability

To configure the Sink Gateway for your specific use case and environment, you can easily set up several properties:

  • Kafka cluster connection
  • Topic naming
  • External system endpoint, including security settings
  • Fine-tune the retry behavior with properties like maximum attempts and exponential backoff parameters for multiple retries of the same event.
  • Transform your events into the format expected by the external system using a decorator. We support both Mustache and Thymeleaf templating for flexible event transformation.

Don’t reinvent the wheel!

We developed the Sink Gateway to address a real-world client problem, learned from it, and created a reusable solution for everyone. If you’re facing similar integration challenges and don’t want to build it by yourself, contact us and we’ll gladly help you out!

--

--

Kris Van Vlaenderen

Father, Solution architect, Founder of 2 startups, passionate about event driven systems