Transactional publishing of events in healthcare

Francesco Nobilia
Babylon Engineering
5 min readNov 6, 2019

At Babylon, we enable our products to scale through an event driven architecture by decoupling our microservices and reducing point-to-point REST API integrations. Given the high availability and scalability requirements, we have selected Apache Kafka as our event backbone.

Within the Babylon architecture, events serve different use-cases including business intelligence, asynchronous communication, and audit logging.

For instance, our appointments service publishes events every time an appointment is booked and cancelled. Within the life cycle of a request, publishing events is only one of the actions that it executes. Upon receiving a booking request, the service does the following:

  1. audits the request using an audit event
  2. persists the new appointment into the database
  3. issues an event saying that an appointment has been booked

While audit event needs to be published even if the appointment booking was rejected, the booking event and the database write should be atomic, i.e. either both successful or both should be reverted.

For instance, the service may crash after inserting the new appointment into the database resulting in the booking event never being dispatched. Publishing the event first and then write into the database may lead to the generation of a booking event with no record of it in the database. The challenge here is make sure that the database insert and the event delivery are executed atomically, otherwise a failure could leave the overall system in an inconsistent state.

A distributed transaction or two phase commit (2PC) that spans the database and the event backbone can be applied to guarantee atomicity at the expense of complexity and performance. However, not all messaging platforms support distributed transactions out-of-the box. For instance, Apache Kafka does not, therefore a custom application-level protocol should have been developed.

A solution to balance complexity and performance, while satisfying our atomicity and consistency requirements in the medical domain, is the transactional outbox pattern which we will explore in the remaining part of this blog post.

Transactional outbox pattern

The transactional outbox pattern provides at-least-once delivery guarantees by using two components: an event queue and an event publisher.

Event queue

Upon updating an entity in the database, a service should be able to transactionally queue the relative event in a durable and reliable storage. A table within the same database where the entities are persisted is the perfect candidate.

The pattern applies as described in the following diagram:

Inserting the new entity and queueing the relative event are surrounded by a database transaction. Either all database operations within the transaction will succeed or they will be rolled back. In case where the transaction is committed, both entity and event will be safely persisted guaranteeing consistency in our domain.

The next step is dispatching the event to the event backbone; Apache Kafka in our case.

Event Publisher

After events are persisted in the outbox, two possible patterns can be applied: either the service publishes events into the backbone or the backbone has a built-in capability to pull them from the service.

The pushing publisher should periodically read the outbox event queue and then push events in batch into the event backbone.

The polling publisher should polls the outbox event queue in batch by either reading directly from the database or using an API exposed by the microservice.

Regardless which solution is applied, both approaches have to tolerate poison pills — faulty events should never be published into the event backbone and they should not block the event publisher. A possible approach here is to make sure that no faulty event is cached. In case of a faulty event, the system should raise an error and the overall database transaction should fail.

One publisher per topic can be set up, allowing for different configurations. For example, the audit events do not need to be published as quickly as our transactional events and can be done in larger batches.

For resilience, a publisher may have multiple replicas. In this case, a leader election would be required to ensure that only one replica applies the publishing task.

After events have been successfully published, they can be deleted from the outbox.

In the case of failures, events could be occasionally published multiple times, therefore meeting the at-least-once delivery guarantees.

Proper alerting and monitoring should be defined to monitor the sizes of outboxes to make sure events are being published in a timely manner.

Conclusion

Given the central position of the event backbone in the Babylon architecture, the transactional outbox pattern not only removes the need for a distributed transaction, but it also enhances the resilience of the overall Babylon architecture. Without an outbox, the event backbone would be a single point of failure. Thanks to the outbox, in case of Kafka issues, events will be buffered in the microservices’ outboxes preventing any data loss.

Does this sound interesting?

We’re only as good as our people. So finding the best people is everything to us. We serve millions, but we choose our people one at a time. We are seeking big dreamers, fast builders and brilliant beings.

Reach out to our Technical Talent Team to find out more:

Edel Russell: edel.russell@babylonhealth.com

Leigh Pwenfold: leigh.penfold@babylonhealth.com

About the authors

This work is the result of a collaboration between Akos Krivachy from the Clinical Care Experience Tribe, Richard Noble and Francesco Nobilia from the Data Tribe. At Babylon, we truly encourage collaboration across teams, which brings together different expertise around the same table to provide our internal and external users with the best value possible.

--

--