Building Event-Driven Cloud Applications and Services
This document discusses the general practices and technologies for building event-driven applications and services. It is the opening piece of the Building Event-Driven Cloud Applications and Services tutorial series.
The quest of building reusable and growable systems
Every developer writes code with a number of explicit and implicit assumptions in mind. One of the most common assumptions we have is that computing devices always execute our code sequentially. Each line of code commands the execution of its logical next, which can be itself (recursion), another function in the same package, or a remote procedure (RPC/RESTful API call) across the Internet. The execution path itself is essentially a crystalized contract; it cannot be modified after compilation and deployment.
At a, relatively speaking, small scale, the sequential execution assumption helps write simple, straightforward, and easy to understand code. However, as the codebase grows larger and larger, with hundreds, if not thousands, features added, the execution path itself will inevitably end up a maze. Great design patterns, software engineering principles and best practices may solve the challenge temporarily, but the danger still lurks; it will fight back as technical debt accumulates.
HTTP RESTful/RPC-based microservice architecture addresses this concern by forcing developers to program to the interfaces of remote services, rather than a local implementation, which is, at its core, the natural extension of the first principle of reusable object-oriented design:
Program to an interface, not an implementation.
Design Patterns: Elements of Reusable Object-Oriented Software (1994)
The downside, however, is the introduction of dependencies between services, a manageable side effect developers can and have to endure. The execution path is still there; the pattern instead helps greatly reduce the amount of code one individual or team will manage and offload a number of responsibilities to other services in a heavily regulated way. This brings out a new set of challenges exclusive to the practitioners of HTTP RESTful/gRPC-based microservice architecture; though we will not discuss them here as they are obviously out of scope of this tutorial series.
Event-driven architecture, on the other hand, attempts to solve the concern by getting rid of execution paths once and for all. In an event-driven app, a logical block of code emits an event, a message piece with contextual data, at the time of completion, rather than orchestrating the execution of another block of code. In fact, the publisher of events care little about what happens next; the following action is left at the discretion of the messenger, usually a message queuing/streaming solution. The messenger passes the event (almost) simultaneously from the publisher to 0 or more subscribers, where the event is processed separately.
The key in making great and growable systems is much more to design how its modules communicate rather than what their internal properties and behaviors should be.
Different from HTTP RESTful/RPC-based microservice architecture, event-driven systems have no dependencies between parts and no interface to program to. It is true that publishers and subscribers still have to honor, to a certain level, preset schemas of events; however, the contract is fairly flexible: the publisher, as explained earlier, knows little about if and how subscribers use the published events. With the execution path out of the picture, the extensibility of your applications and services grows exponentially: you can add or remove blocks of code, in the form of subscribers, at any time; subscribers to the same event stream work simultaneously by default without interrupting each other in any way.
And this is not the only benefit event-driven architecture has. With message queues serving as the middleman in between, your applications and services are granted unprecedented scalability: solutions such as Google Cloud Pub/Sub and Apache Kafka can withhold, and later, distribute a massive amount of data in a short timeframe with negligible delay if configured properly. Many message queues are capable of self-adaptation as well; they work in synchronization with subscribers and do best effort to not overwhelm them. This very quality is extremely important in today’s world where more and more businesses run on real-time data: with billions of devices communicating with each other, apps and services must become super-elastic.
It is true that monolithic and HTTP RESTful/RPC microservice-based systems can scale too with the right platform (e.g. Kubernetes/Google Kubernetes Engine). However, with execution path in play, each function invocation, RPC call, and/or HTTP request implies the (immediate) execution of another (possibly remote) block of code; the invocations, calls, and requests themselves take resources as well and cannot be batched. In general, HTTP RESTful/RPC-based microservices should communicate with each other only when necessary; talkative services are an infamous anti-pattern in this architecture.
This tutorial series will discuss in detail the advantages, and of course, the disadvantages of event-driven architecture, along with the common patterns and practices developers use to build their own event-driven system.
What is event-driven?
Event is nothing but a piece of data. More specifically, it is an immutable small piece of data that documents one specific behavior of a system at a specific time; common examples include your thermostat detecting a change of temperature in the room, or a customer adding a new item to the shopping cart. Through reading the flow (sequence) of events of a system, one can easily reconstruct its operation history.
Generally speaking, the format of events is up to developers themselves. Cloud Native Computing Foundation is now supervising a standardized specification for describing events, namely CloudEvents, with many cloud service providers now planning to support this format. This tutorial series uses the 0.3 version of CloudEvents specification throughout; it is strongly recommended that you use this specification in your event-driven applications and services as well.
Also, for simplicity reasons, this tutorial series use an experimental project, CloudEvents Generator, to produce and consume events wherever possible. Note that you can build up to standard CloudEvents yourself as well using in-memory structures of your preferred programming language.
Event-driven is a loosely-defined term; its usage varies with developers. One may argue that any system using events with the publisher/subscriber paradigm (sometimes called the notification paradigm) can be considered an event-driven system. Depending on how much events are integrated into the system, event-driven systems can be roughly categorized into two types: reactive ones, and stream processing ones.
Reactive event-driven systems
In a reactive event-driven system, events are, in essence, function invocations (or HTTP RESTful/RPC calls) without synchronicity. The publisher emits an event, which in effect triggers an action in subscribers without the publisher acknowledging. For example, a flight booking service may set up its API backend to emit a
orderCreated event when a customer books a flight; the message queue passes the event to a subscriber service, which processes the event, contacts the airline to reserve a ticket, and charges the customer’s credit card.
Some may consider this a superficial way of adopting events (a passive-aggressive function invocation); however, reactive event-driven systems can still enjoy the many benefits of the architecture:
- With the subscriber service taking the responsibility of ticket booking and payment processing, the API backend can respond much faster, telling customers that the system is processing their orders right after the
orderCreatedevent is emitted and later notifying them the results.
- Teams can now work on the API backend, the ticket booking functionality, and the payment processing functionality separately without coupling worries
- The system is now much more prepared for the traffic spikes in holiday seasons. The message queue withholds
orderCreatedevents automatically when the subscriber service is overwhelmed; some solutions can even auto-retry temporarily failed reservations and payments with proper configuration.
Stream processing event-driven apps
Event-driven systems with stream processing uses events in a more intensive, data-oriented manner. In this pattern, the subscriber(s) of events are usually stream processors which extract states from the event stream, and pass the states to interested parties. Such systems are usually supported by a dataflow solution, such as Apache Flink, Apache Spark, and Cloud Dataflow. If helpful, think of a system that monitors the variance of temperature in an area using IoT devices: every second thermostats around the area reports their readings in the form of events to the service via message queues, where each event includes a temperature data point of a specific time; the service collects all the events in a set time window (e.g. every 15 seconds), and uses the stream processor the calculate the statistical variance of the data (the state); the service then passes the state to another system (e.g. a control panel) for further inspections.
Event-driven systems with stream processing are commonly adopted in the industry in recent years. Social networks use it to calculate likes, page views, listens, etc, while cloud service providers use it for fraud/abuse detection. This pattern is also the foundation for many real-time data analytics applications and data transformation pipelines.
Event sourcing and CQRS
Event sourcing is another terms commonly seen with event-driven systems. The nomenclature might be a little bit confusing; it is actually a data persistence pattern rather than a design pattern for event-driven systems. You may think of it as an alternative to relational (SQL) databases and NoSQL databases. The design philosophy of this pattern might be better explained with an example:
Imagine that you are building an electronic voting system for the worldly famous TV show So You Think You Can Code. A voting system is by nature write-intensive: the counts matter only in the end but people submit their votes all the time. Consequently, if you use a relational (SQL) database as the database backend, it can be easily overwhelmed as each vote requires updating a table with a row locked and then released. With event sourcing, however, accepting vote simply requires an insertion into the event log: since each vote (event) is immutable, there is no locking required. When you need the final count, simple read through the logged sequence of event and add the votes up:
The nature of event sourcing makes it a natural candidate for data persistence in event-driven systems. However, event sourcing is not the only choice; many reactive event-driven systems, for example, still uses relational (SQL)/No SQL databases as storage.
When people talk about event sourcing, you will probably hear the term CQRS (Command Query Responsibility Segregation) as well. Loosely speaking, in event-sourcing systems CQRS helps create a materialized view over the event sequence so that you can query data as if you are using a regular DBMS, saving the trouble of scanning events and calculating numbers yourself every time you need a state. This design is not event-sourcing exclusive; at its core it simply states that one can use a different model to update information than the one you use to read information.
This tutorial series will not discuss much about event sourcing or CQRS as they are not an integral part of an event-driven system. If you are interested, refer to these blog posts authored by Martin Fowler (Event Sourcing, CQRS).
Event-driven systems and serverless computing
Event-driven architecture is a natural ally with serverless computing platforms, especially the FaaS (Functions as a Service) ones. The architecture and the solution share many characteristics: both are designed with decoupled systems and scalability in mind. Many serverless computing platforms also adopt the pay-as-you-go pricing model, which fits perfectly with the publisher/subscriber paradigm. Some of them, such as Cloud Functions, even have built-in integration with message queue solutions (in this case, Cloud Pub/Sub).
This tutorial uses a few serverless computing solutions in the demo. In your production app and services, however, take some caution before choosing serverless as the platform for running subscribers: technical restrictions (cold start time, runtime limits, latency, etc.) aside, building, testing, deploying, and managing serverless code can be a challenge of itself.
As a side note, many serverless computing solutions are stateless, which makes it fairly difficult to run group operations (or stream processing in general) on them. You can add a data persistence layer to solve the problem, but it can become fairly costly and difficult to build/maintain. As a rule of thumb, it is better and easier to use them in reactive event-driven systems rather than stream processing ones.
Should I go event-driven?
So far we have said a lot of nice words about event-driven systems. Sadly, as with many ideas and concepts in the field of computer science, every benefit event-driven architecture offers has a price marked. Event flows (streams) are notoriously difficult to track; without the execution path serving as the map, it may take great efforts for developers to find a bug, or a performance bottleneck in the endless flow of events. There are many tools and practices that can help alleviate the problem (which we will discuss later in this tutorial series) though none of them is the ultimate solution; it simply is a price we have to pay for separating publishers and subscribers.
Another potential pain point in event-driven systems is the message queuing/streaming solution. It is common for developers to assume that the middleman will perform in accordance with their promises, which, in 99.99% — 99.99999% of the time, is true; however, hiccups can still happen. Message queues may unexpectedly stop working, send a large number of duplicate messages all of a sudden, or introduces unexplainable and unreproducible delays without a warning. Be prepared.
Even though there are some prototypes fully embracing the event-driven architecture, many teams use event-driven systems as a part of this service exclusively for a specific workflow that works best with events. You can, for example, introduce an event-driven microservice in your service mesh dedicated for data analytics while keeping everything else HTTP RESTful/RPC-based.
In conclusion: think twice before proceeding. Event-driven architecture sounds fancy, but no one will blame you for using a monolithic system if it works just as well. The architecture itself can be a magical solution for some problems, but its limitations can be similarly overwhelming in specific scenarios. Adopt event-driven systems in a case-by-case manner.
This tutorial series includes the following pieces:
- Using CloudEvents and CloudEvents Generator
- Reactive Event-Driven Systems and Recommended Practices
- Introduction to Event-Driven Systems with Stream Processing
In this tutorial series, you may see the Open in Cloud Shell button
before running demo projects. This button helps you try the code out without having to set up anything locally; it works on mobile devices as well. Cloud Shell is a part of Google Cloud Platform products and services; to use Cloud Shell, you must have a Google account with Google Cloud Platform access. You can sign up for Google Cloud Platform here.