Behind the scenes: McDonald’s event-driven architecture

Global Technology
McDonald’s Technical Blog
4 min readAug 24, 2022

--

We explore our journey of developing a unified platform enabling real-time, event-driven architectures.

By Vamshi Krishna Komuravalli, Director, Principal Architect and
Damian Sullivan, Senior Manager, Software Development

Event-based architectures enable integration flexibility, scalability, and several real-time abilities. However, successful implementation of such architecture requires a robust platform to support it.

In the tech world, events are actions that programs recognize and have an impact on system hardware and software. At McDonald’s, we use events across the technology stack for asynchronous, transactional, and analytical processing use cases, including mobile-order progress tracking and sending marketing communications (deals and promotions) to customers.

A unified eventing platform is designed to provide a scalable, secure, and reliable platform for the real-time data flow between services and applications across different domains. It ensures consistency and reduces the implementation and operational complexity involved in maintaining and adopting eventing architectures.

In this two-part post, we will walk you through the implementation journey and explain how it works.

The challenge:
While event-based integration is not new at McDonald’s, with globally distributed teams of developers with diverse skill levels, we have seen a wide variety of technologies and patterns used in building platforms. A lack of a standardized approach can lead into inconsistent and operationally complex implementations, affecting availability, reliability, and data quality.
When we started envisioning this platform, we established a few high-level design goals to enable our teams to work in the right direction.

Design goals:

  • Scalable: Needs to auto-scale to accommodate the increasing number of events flowing through it without any loss in quality of service.
  • Available: Needs to be highly available to withstand failures in its components.
  • Performant: Events should be delivered in real-time with the ability to handle highly concurrent workloads.
  • Secure: Data must adhere to data security guidelines around encryption, access control, etc.
  • Reliable: Must be dependable with controls in place to avoid losing any events.
  • Consistent: Must maintain consistency in the pattern implementations surrounding error handling, resiliency, schema evolution, monitoring, and disaster recovery.
  • Simple: Needs to minimize implementation and operational complexity and enable teams to build on the platform with ease.

With these goals in mind, we selected a set of tools, technologies, and patterns to create a unified platform.

Under the hood:

At a high level, events may be created and sent to the architecture, and consumers (other applications) further process the events.

It has a few key components:

  • Event Broker: We use AWS Managed Streaming for Kafka Service (MSK) to host topics and events and provide semantics for producing and consuming events, because it integrates with the other AWS services we use. It is important to have a good balance between reducing the operational overhead and having the flexibility for customization to our use cases.
  • Schema Registry: Events published to the event-based architecture follow a well-defined contract, ensuring data quality in the downstream consuming applications, while providing a clear evolution path for producing applications when an event schema changes. The registry runs schema validations and compatibility checks between versions of the events.
  • Standby Event Store: To avoid loss of messages in the event the MSK is unavailable, the platform is wired with a standby data store, where it writes events onto a database . The architecture provides tools and utilities to read messages and publish them back onto MSK, once it’s available.
  • Custom Software Development Kits (SDK): We built language-specific libraries, providing APIs for both producers and consumers to write and read the architecture’s events with built-in logic to perform schema validations, handle errors, and implement retry patterns. The SDKs act as accelerators for our development teams, improving their productivity and providing a consistent approach to implementing best practices.
  • Event Gateway: McDonald’s event-based architecture is designed to support both internally generated events within our applications and events produced by external partner applications that are routed via an event gateway with layers of authentication and authorization. The gateway provides flexibility and abstraction without exposing our internal topic management.
  • Supporting Utilities and Tools: Our developers and service reliability engineers have a set of tools to rectify events in dead-letter topics , provide visibility into the cluster health and perform any cluster administration tasks.

Now that we’ve explained our approach to designing the event-based architecture, we’ll explain how we it works in next week’s blog post — check back next Tuesday!

--

--