Event-driven architecture—evolution from servers to brokers to queues

a (not-so) brief history of a health & fitness data pipeline: part i

caleb lee
SCTD, GovTech
16 min readJun 4, 2024

--

This three-part series discusses some of the lessons I’ve learned from scaling a microservice backend system over the years. It dives deep into the nuances, but assumes a basic understanding of software engineering, particularly in building backend systems.

Instead of cutting straight into the meat of the subject that you’re probably itching to read about, I would first like to provide some domain context, celebrate my team, and the fruits of our labour. Here is the product of some late nights spent pulling our hair out, looking for bugs that defied the laws of logic, along with other glorious days of breakthroughs and camaraderie:

SmartGym: a public fitness platform

SmartGym treadmill
SmartGym kiosk UI

My first foray into a software engineering career involved the wonderful privilege of being in the pioneering team that built and scaled this greenfield project.

SmartGym was first conceptualised to be “every citizen’s #1 fitness companion”. We did not know what that statement exactly meant, but it has illuminated our path to becoming a public fitness data platform that automatically tracks workouts, provides users with fitness insights, and allows third parties to build their use cases — such as fitness challenges and personalised workout recommendation on our platform. This is enabled through an ecosystem of connected sensors which we’d retrofitted on public gym equipment, bridging the digital and physical realms of a smart gym.

Sorry, that was a mouthful. If a video speaks a thousand images, and an image speaks a thousand words, here’s a product demo video that I made a while back, to give you a better idea:

SMARTGYM demo
or a news feature

SENSEI: a health monitoring platform for senior citizens

Some might say that it sounds a little dorky to call ourselves the sensei team. It was originally a casual pun made by my lovely teammate, who has a knack for spitting out jokes that might induce rolling eyeballs, facepalms, hair-raising goosebumps, or a shiver down the spine; but the moniker eventually stuck.

To clarify: the term sensei is an East Asian honorific term that literally translates to “a person who comes before”, while also colloquially used to address Traditional Chinese Medicine physicians. Naming our system sensei is a wordplay that marries the idea of sensors (being an IoT system), and health, with the honourable senior citizens who came before us and built a thriving metropolis.

Considering software engineers are bad at naming things in general, I have to say this was a clever one, Ding Hao, it made me giggle.

SENSEI care kit
SENSEI wellness pod

The SENSEI system ingests health-related metrics from a range of devices: from blood pressure monitors to weighing machines to wearables. It also includes research and development of contactless measurement methods: such as using radars to monitor vital signs and detect falls, or webcams to detect tiny fluctuations in skin colour due to the pulses in blood flow volume (photoplethysmography) — which is used to estimate blood pressure, respiratory and heart rate.

Faced with an aging population in Singapore, this suite of technology empowers the elderly to take more active ownership of and pay more attention to their health needs, while also improving the level of support caregivers can provide.

A unified pipeline

Because our health and fitness use cases have such close synergy, the SmartGym and SENSEI platforms utilise the same underlying data ingestion pipeline. However, this article will focus more on examples from SmartGym for the ease of following a cohesive narrative.

The story of our evolution

Preamble: assembling vs growing

All complex systems that work evolved from simpler systems that worked.
— Gall’s Law

Of course, we didn’t start with all these features in mind, which also meant that our system wasn’t built for these use cases right away.

My tech lead used to tell us that our project falls into the latter of the two categories: finite games and infinite games. The former is played for the purpose of winning, the latter for the purpose of continuing the play. Building a feature is a finite game, but building a platform is an infinite game (stay tuned to find out what that means).

Here in GovTech, the agile philosophy is encoded within our DNA. And being in an agile development team means it’s less about piecing together a complete product, and more about growing and evolving a system — one that embraces change with open arms. With each iteration, we learn things about the domain that inform our next step forward, making our system a better fit for its domain and securing its longevity.

In the same vein, I wrote this series to explore our system architecture, not just as a snapshot that explains the “what” and “how” of building a system. This is a story of evolution that also seeks to present the “why” that shaped our system over time.

v1.0 — Saving measurements

We had lofty visions of what our platform would become someday. But in the days of our humble beginnings, long before our pipeline evolved into a platform, our engineering challenge was defined by the most fundamental of objectives: data persistence.

Imagine taking a measurement on the weighing machine or blood pressure machine or wearable, and being able to see it displayed on an application, along with all your readings from the past year for comparison.

SmartGym weighing scale
SmartGym weighing scale UI

Now, open your eyes and tell me what the backend system should look like. You were probably thinking of another cookie-cutter web app architecture, i.e. a no-frills application interface with a long-term storage. And so were we.

System architecture

Presenting the classic “CRUD API backend”:

Three things are happening via our REST API service:

  1. Login: A Session record is created, pairing Machine ‘abc’ with User ‘123’.
  2. Saving: Incoming measurements from Machine ‘abc’ are tied to User ‘123’, based on the previous step, and saved as a HealthMetric record.
  3. Fetching: User ‘123’ fetches Exercise or HealthMetric records belonging to him/her.

Not forgetting the other bells and whistles, like input validation based on our resource model, API key (service to service) and JWT (user session) authentication, access control, application logging and monitoring, etc.

Now that we have the smallest hint of a health and fitness platform in place, let’s spice it up.

v2.0 — Real-time user interface

SmartGym weight stack

Being able to see my past records is cool and all. But eventually, our system needs to be compatible with more complex forms of health metrics and exercises. As a user, I would also expect some immediate form of value-add from a fitness companion during my workout:

  • How many reps of shoulder-press have I done in my current set?
  • Was my arm properly extended in my previous push-up rep?
  • How many calories have I burned on the treadmill so far?

These are questions that can already be answered based on the information gathered from the sensors that my teammates have developed, but the next engineering challenge for our pipeline is to enable that with: real-time streaming.

Let’s break that down into smaller requirements:

Stream processing
Some incoming measurement needs further processing, e.g. push-up detection based on body pose landmarks.

Streaming buffer
Some of the calculations require a window of streamed measurements, e.g. average speed. A streaming buffer acts as an in-application memory before data subsequently gets flushed into the database.

Persistent client connection
Each real-time measurement has to be pushed to our front-end client.

Publisher-subscriber broker
There are scenarios where we need to stream each single measurement to multiple destinations for processing, e.g. using the same body pose landmark measurement to detect reps and detect exercise type, in parallel.

Instead of having our server maintain a list of destinations to forward the event to, we reverse this dependency.

In such cases, adding a broker in-between decouples the event publisher from the event subscriber, allowing for cleaner segregation of responsibilities: the API server is responsible for publishing an event that signifies a new measurement to its corresponding topic, and the ingestion workers with domain-specific implementation are responsible for subscribing to the topic of interest.

Moving from a request-based to an event-based architecture

Since our system is no longer just interested in saving and fetching data, but also to operate in response to a steady stream of events from the real world, our architecture has to reflect this paradigm shift.

Win #1 — extensibility: the bread and butter of platform systems

In this publisher-subscriber pattern, servers do not have to maintain a list of destinations to forward the event to. This removes tightly coupled service dependencies that generally make it cumbersome to extend or modify system functionality in a plug-and-play manner.

Just like an operating system, one of the key characteristics of a platform is that it allows other third parties to build their applications on top of its core, without conflating them with our platform’s core functionality.

So moving from a request-response to publisher-subscriber communication model would be a key enabler for evolving from a complete product to an extensible platform that is open to possibilities.

Win #2 — performance

  • Asynchronous writes: By outsourcing the stream processing, our API server can return a response to the client immediately after publishing. This deferred execution enabled us to cut down response times for cases where heavy processing of incoming measurements is needed.
  • In-memory reads: Redis Streams stores data in-memory, acting as a cache layer for higher real-time read throughputs.

A note of caution on event brokers

Redis PubSub and Socket-IO have an “at-most-once (<= 1) guarantee”, a.k.a “fire and forget” property. A subscriber can be temporarily down and not listening, when an event of interest is being published. Since published events are not persisted, this might lead to missing data at best, or at worse, leaving the system in an inconsistent state. Nasty.

This is one of the factors driving us towards our next iteration.

v2.1 — From events to messages

Terminology: events vs messages

Up till this point, we have been talking about events as the carriers of information flowing through the system. I would like to introduce its sibling that is often referred to interchangeably: messages. There are, however, some subtle differences between the two constructs. I wish to pin down this technical jargon which the rest of the article builds upon.

Events encapsulate a change in the state of the entire system, although messages may also overlap in this aspect. However, the key distinction is that messages are usually associated with an action that needs to be taken by the recipient.

Thus, events are usually broadcasted while messages are point-to-point interactions between two services.

Terminology: brokers vs queues

The underlying channel that the carriers utilise is different too. Events are published and subscribed through a topic within an event broker. Messages are sent and received through a message queue.

Despite the difference in the communication model, both event-based and message-based systems reap the same benefits of:

  • increased responsiveness from deferred execution
  • higher extensibility from decoupling producers and consumers.

If you are still confused, this article does a much better job of contrasting the two.

Although we are moving from events to messages as the carriers of information, I will still refer to the system as an event-driven one to reflect the paradigm of “reacting to state changes”. I hope I’ve tied up the loose ends regarding the terminology. Let’s carry on.

Win #1 — horizontal scalability

Message Queues follow a one-to-one communication model where messages have a specific destination and are consumed by one receiver.

Event brokers, on the other hand, operate on a one-to-many broadcast model where events are sent to multiple subscribers.

Since each message is consumed by only one receiver, a message queue brings with it load-balancing capabilities that an event broker does not.

By replacing event brokers (Redis PubSub) with message queues (RabbitMQ), we were able to increase the number of replicas of our ingestion workers, especially when latency crept in and higher throughput is needed. If there are sudden traffic spikes beyond the maximum capacity of our system, having a queue system also gave us some buffer for the messages to accumulate instead of getting dropped.

In cases where we still need a one-to-many broadcast behaviour, queue exchanges have fan-out patterns to achieve exactly that.

Win #2 — message durability and reliability

On top of that, message queues have reliable “at-least-once” (>= 1) message delivery guarantees, while the PubSub broker fires and forgets. How is this achieved?

  1. When a producer sends a message to the queue, the message queue persists it.
  2. When a consumer receives the message, the message queue waits for an acknowledgment from the consumer before deleting the message.
  3. This acknowledgment ensures that messages are successfully delivered and processed by the intended recipients.
  4. If the message queue does not hear back from the consumer after a specified amount of time, the message queue assumes that the consumer is dead, and redelivers the same message to the next available consumer.

With that, I present to you:

Hold up, not so fast.

Caveats: The event-driven “work ethic”

With “at-least-once” message delivery, some workers that take a long time to acknowledge, for various reasons, might be falsely declared dead. And the message queue redelivers the same message to another worker.

This retry mechanism introduces a new problem: duplicate messages. Because of that, certain assumptions need to be built into the behaviour of our ingestion workers:

Idempotence
Idempotence is a property where multiple executions of a piece of code should always yield the same result, no matter how many times it got executed. For example, a trivial database “insert” operation is not idempotent while an “insert if does not exist” operation is.

This is the secret sauce that ensures that a pipeline with an “at least once” (>= 1) message delivery has an “exactly once” (== 1) overall system behaviour.

With retry mechanisms in place, idempotence needs to be a first-class concept in our worker implementation.

Statelessness
Processing a message tomorrow should yield the same result as it would today. This means that the worker’s process cannot depend on external information that is subjected to changes, i.e. all the necessary information (state) should be encapsulated in the immutable message.

Self-contained messages ensure deterministic retries with no side effects or race conditions.

That makes a worker stateless.

Stateful vs Stateless

Causal dependencies
Message queues don’t guarantee that retries occur in the same order, which can cause issues with sequential operations. To prevent breaking causal dependencies, sequential tasks should run synchronously on a single worker.

Error handling and retry patterns
From my experience, not all faults in the system are created equal. They usually fall under one of these two classes: operational failures or logical exceptions.

Operational failures can be attributed to network partitions, service unavailability, or other transient reasons, and should be retried at a later time. Logical exceptions happen deterministically and need to be handled in a bug fix.

Without looking into the stack trace, it is difficult to tell what kind of fault it is, so retrying should be the default option. That means that the worker code should not be silently catching generic errors, but fail gracefully instead, so the message queue can take care of the retries. In cases as such, perhaps it is better to be right and dead than to be wrong and alive.

To prevent flooding our pipeline with retry messages, we used an exponential backoff pattern, which adds an increasing delay with each retry attempt.

However, in the case where it is a logical error due to a bug or message poisoning, it will lead to an infinite retry loop that will reduce the capacity of our system.

We can isolate this fault with the circuit breaker pattern: after a specified number of retry attempts, it is safe to assume that it is a logical error and persist the unprocessed message in a dead-letter queue. These dead-letter messages can be added back to the main queue when a bug fix is deployed.

Circuit breaker pattern

More caveats

So let ye not be fooled, the elegance of an event-driven architecture comes with hidden complexities that we had to learn about the hard way. Here’s more:

Tradeoffs: missed messages vs duplicate messages
We picked our poison. In our case, we would rather deal with duplicates than missed messages: Imagine slogging for a workout only to realise that it didn’t count because the system failed to save it — your blood boils further. As mentioned, having an “at-least-once” guarantee means that there is an implicit requirement to develop all our consumer services to be idempotent.

However, idempotence comes in all shapes and sizes.

It could be trivial for cases like database calls, using techniques like writing to a unique key or updating a record in place, instead of partially incrementing or appending to existing fields. When it comes to sending data to external services, preventing duplicate outgoing requests might be more tedious, such as hashing requests and storing them for future duplicate checks.

Transparency
Decoupling API services from the ingestion workers means that clients would not know if and when an error has occurred down the pipeline, because a successful API response is returned the moment a message is added to the queue.

To let our front-end application know that a record is successfully saved, we had to establish a separate communication channel to inform the clients of the success/failure downstream.

Observability
Be it during development or in troubleshooting a production incident, looking through the scattered application logs of a distributed system can be a real pain in the neck because of the number of services that an ingested piece of data passes through, especially when multiple replicas are involved. This usually looks like having to use ctrl-F on 20 open tabs, and painstakingly scour through each one to locate the error.

Tracing for distributed systems involves instrumenting each service: to create and export spans independently, and embed “connective information” in the message before this context is propagated and picked up by the next service downstream. The context is subsequently utilised to stitch the individual spans into an end-to-end trace.

Logs tell us “what happened”. Traces tell us “where”, and “when”.

Along with log aggregation, we were able to, in a single place, immediately narrow the search space when hunting down an anomaly.

Maintainability
Function callers have to comply with the set of function arguments defined in a function implementation. In the same (but more distributed) fashion, producers (senders) and consumers (recipients) need to have a common understanding of the message format. Otherwise, the incompatibility might not be detected until much later on during integration testing.

Message format includes the payload schema and the serialisation protocol — this is part of the service’s interface, which needs to be standardised, documented, and carefully evolved.

One could use standards like Avro or Protocol Buffers (language-agnostic), or in our case, write data classes that support richer data types for producers and consumers to import as libraries (covered later in the series).

Wrapping up

If you’ve been following me up till now, wow! That was pretty heavy stuff.

We evolved from a trivial request-response system into an event-driven system with real-time streaming and data persistence capabilities, firstly through event brokers and subsequently via message queues.

Our stateless message-based ingestion pipeline

As you can tell, the distributed and asynchronous complexity inherent in an event-driven architecture demanded a paradigm shift in our development team. However, the trauma of our growing pains resulted in a more reliable and scalable ingestion pipeline.

In the next part of the series, our event-driven pipeline matures further in scale and complexity. Building upon the saved records, we process them periodically and on-demand to enable new features that enhance a user’s workout experience into a more collective yet personalised one.

With that, we tackle new engineering challenges of maintaining distributed data views while keeping our system eventually consistent.

Stay tuned!

Disclaimer: The way this series is written might create an impression that the system has undergone the non-overlapping phases outlined in a chronological sequence. But in reality, progress is never linear. The different phases simply serve as a narrative of key driving forces that might have coexisted at one point in time. Likewise, the supporting visual examples also span across versions that do not strictly follow a chronological sequence.

--

--

caleb lee
SCTD, GovTech

somewhere where the lines of technology, art, and the human experience intersect - that’s where I’m heading towards!