Event-Driven Architecture: Use Cases

Published in

disney-streaming

4 min readOct 27, 2020

Team ATLAS at Disney Streaming Services leverages event-driven architecture for numerous use cases across the products and services we make available to consumers. This article showcases some applications leveraging distributed event-driven architecture, and the team’s learnings from the implementation.

To meet the high volume of requests at scale and enable rapid development, we embrace microservices architecture. Though the public-facing part of our services are request-driven, internally, due to the loosely-coupled nature of microservices, the message transfer is event-driven. This article showcases how event-driven architecture fits in our ecosystem and how it can be used to automate specific tasks.

Quick Introduction:

What is an event ?

The dictionary definition of an event is “an occurrence.” Did you just blink your eyes? Yes. That is also an event. In the technical world, an event represents a significant change in state of software or system hardware.

Examples:

User profile updated in database.
Message count in a queue greater than threshold.
CPU utilization of instances exceeding threshold.

What is Event-Driven Architecture ?

Software architecture based on reaction to events.

Who are the actors in this architecture?

Producer: Detects the event, converts the event into a message (Ex: JSON payload), and places it in the event router for further processing.
Router: Filters and propagates the message from event producer to event consumers.
Consumer: Consumes the message from the event router and performs a specific task.

Use Cases:

#1. Integration with external SaaS applications

How do we automate the process of fetching information from upstream sources and updating databases based on an id present in log messages?

For implementation, the event notification pattern of event-driven architecture is utilized, allowing for the reversal of dependencies between the systems. For example, instead of the Lambda trying to find out the id from log message and then querying upstream, the log monitor notifies the Lambda with the id for further processing.

High-Level flow diagram of integration with external SaaS applications

Datadog Monitor was used to set up custom alerts looking for certain keywords in our log messages. For example, an error message with details such as “Artifact not found for id: 123” triggers an event with id in message to event router.

For event ingestion and routing messages from external SaaS applications to our AWS services, AWS EventBridge served as the event router. Lambda consumers received the messages based on the rules set at EventBridge. Once Lambda was triggered with payload, it queried upstream and updated the destination database.

What happens if we get duplicate messages from event router ?

Even in an exactly once mode of message delivery, due to the distributed nature of event router or transient network conditions, duplicate messages can be delivered for the same event. To handle such scenarios, the consumers should be made idempotent. For example, updating a status column in a table to success for a given id shouldn’t have any additional effect even if it is executed multiple times.

#2. Cross-region data replication

How do we replicate artifacts uploaded in one region to other regions with custom processing?

At a high level, S3 event notification is enabled. As new objects are uploaded to the bucket, S3 posts JSON messages to SNS topics. SNS acts as the event router. AWS Lambda is subscribed to these topics. It consumes the incoming messages, custom processes the artifacts and uploads them to other regions.

High-Level flow diagram for Multi-Region replication

Pain Points and Lessons Learned:

Setting up separate workflows for replication from region 1 to region 2 and vice versa results in an infinite loop. This showcases how easily the events can get tangled up and introduce complexities. This calls for proper governance and synergy between teams on establishing event triggers and workflow.
Due to the asynchronous nature of processing, it is hard to get a complete picture of the system, which makes troubleshooting system issues challenging.
Develop idempotent consumers to handle duplicate messages from the event router.
Being aware of the “mode of message delivery” attribute in the event router. Possible values are at most once, at least once, and exactly once.
The specific use cases showcased above were not impacted by out of order messages. However, it is important to be mindful of out of order messages during system design.

Summary

Event-driven architecture brings immense flexibility in development and operations by reducing interdependencies between services. It also inherits the benefits of loosely coupled systems by being scalable and resilient.

There is one trade-off to consider: event-driven architecture relies on eventual consistency.

To summarize, event-driven architecture is use-case specific. As for any system design, being mindful of the trade-off and challenges, understanding the limitations, and playing to the strengths of architecture will yield in building better solutions.

https://pixabay.com/photos/zen-garden-meditation-monk-stones-2040340/

Curious to learn more about Event-Driven Architecture? Check out these articles: AWS Event-Driven Architecture & Martin Fowler’s: What do you mean by “Event-Driven”?