Publishing Events to Kafka using an Outbox Pattern

Swapnil Desai
Contino Engineering
6 min readOct 22, 2020

--

In this blog post, I would like to discuss and share my experience and the approach taken to publish Events from Microservices to Kafka using a well-known Outbox Pattern (a variant of Transaction Outbox pattern).

Introduction

In the world of Microservice architecture, services along with updating their own local data store they also need to notify other services within the organization about the changes occurred. This is where Event-driven architecture has its prominence and Apache Kafka here becomes a de-facto standard to capture and store these change records; wherein individual services can publish the changes as events to a Kafka topic that can be consumed by the other services.

On one of the projects that I worked on, we had to design and implement a feature that will record and publish all the API called data from API-based microservices into Kafka. API called data is the request and response data associated with an API invocation such as HTTP endpoint, method, path & query params, and request & response body and headers.

Requirements

One of the functional design requirements was to de-couple the Kafka producing logic from individual Microservices and still reliably publish events to Kafka thus avoiding dealing with technical constraints such as handling connection failures to the Kafka cluster, writing retry logic, and handling too many requests made. Security also wanted to avoid embedding the Kafka client TLS certificates in the Microservices (Lambda) layer.

The services had to be designed and built without any direct interaction with the underlying Confluent platform i.e. Apache Kafka Broker and Schema Registry thus avoiding any messaging constructs.

Furthermore, all the API called events (success and failure) had to be captured from different sources within the solution and published to a single Kafka topic.

Solution

In this blog post, I will share how we decided on leveraging the Outbox pattern and implemented it using the recommended 2-step process. First, committing the message to a persistent data store (an Outbox table) and then a separate service polls the Outbox table and publishes the message to a Apache Kafka topic.

Initial design incorporated using an intermediate queuing system SQS in front of Kafka (I know — a queue in front of another queue… bad choice right!), however, we soon pivoted from this architecture. In order to reliably publish the API called data/events, we came across the Outbox pattern (a variant of the Transaction Outbox pattern) and I believe it fit very well into our architecture. This pattern provides an effective way to publish events reliably.

Outbox Approach

The idea of this pattern is to have an “Outbox” table and instead of publishing the API-called events directly into Kafka, the messages are published into the Outbox table in an AVRO compatible format. Data is collected from all the source systems (Microservices + Logs Consumer Service as shown in the architecture) and is committed to the Outbox table. This also allowed us to consume, transform and collect both the success (2XX) and failure events (4XX & 5XX logs available on a separate Kafka topic from API Gateway) and publish them to a single location (i.e. the Outbox table). Another component (Producer Service as shown in the architecture) would then poll these events asynchronously and publish them to Kafka. Once the message is published successfully to the Kafka topic, the message is then marked in DB as published.

Outbox Architecture

Design Decisions

Some of the design decisions taken as part of this feature implementation.

  1. The Kafka AVRO schema file to publish events to Kafka is stored locally along with the microservices as the design goal was to not integrate with the Schema Registry. This schema file is used to create the Event records in the Outbox table.
  2. All API-called data is stored in a jsonb column in the Outbox table. This provides the ability to store messages as JSON which also strips out any insignificant whitespaces.
  3. Along with other metadata fields such as Kafka topic name, the schema for the Outbox table is modelled to have a published Date Time column. The Producer Service updates this column once the message is read and published to Kafka thus allowing at-least-once delivery of the event.
  4. Enabled DB pessimistic transaction locks with timeouts support provided by Spring Data JPA when reading data from the Outbox table which ensures only 1 worker will be able to process a message/record at a time and subsequently update the published column in the DB.
  5. As the solution is deployed on AWS, IAM database authentication feature is enabled which gives permissions to the Lambda functions (Microservices) and the ECS containers (Logs Consumer & Producer Service) to use an authentication token instead of a password when connecting to the DB instance.

Benefits

This pattern ensures at-least-once delivery of messages to Kafka. The events are persisted in the “Outbox” table indefinitely with an ability to replay them in the future in case the messages are no longer available on Kafka topic thus allowing any offline Kafka consumers to read messages at their will. This approach also allows taking regular backups of the DB before cleaning up the messages. The messages can also be moved to low-cost data storage such as AWS S3 where the DB backups are actually stored.

Limitations

This approach has few limitations though.

  1. Any updates/breaking changes to the schema would not be detected until the messages are polled from the Outbox table and published to the Kafka topic. Such changes need to be applied to all the services. AVRO Schemas now support fingerprints that allow tagging the messages to evolve under that schema. A solution is to record fingerprints along with the event in the Outbox table and implement business rules in the Producer Service to filter and publish the messages in AVRO compatible formats.
  2. Individual Microservice could potentially end up having its own Outbox table within its own domain to avoid the data published and shared by each Microservice. Improvement: An option here is to proxy the publishing logic by introducing a REST-ful interface in front of the Outbox table thus making it easier to publish messages via a single service endpoint which controls and manages the Events datastore. Diagram below:

Alternatives

A relational database was used to store all API Called events in the “Outbox” table, but if your Architecture landscape has a No-SQL database then instead of using a dedicated Outbox table, you can just append the event as an attribute to your local/primary message.

For publishing events from the DB to Kafka, instead of building a producer service, an option is to use CDC (Change Data Capture) tools like Debezium but it depends on whether your cloud service provider has support for it. Alternatively, Kafka DB Connectors can be used can be used but this requires additional infrastructure to be set up for running the connectors on the Connect cluster. Confluent 6.0 platform (Project Metamorphosis) now supports ksqlDB-Connect integration which allows running the Connectors on the Confluent KSQL server. More needs to be explored on that! (maybe in my next blog)

References

Microservices Architecture

Transactional Outbox

See you in the next post,
SD

Where to find me? 👉

Find me on LinkedIn 🔥

--

--