Building Smarter Content Workflows With Event-Driven Architecture
The Content Supply Chain: A Logistics Business
At its core, the supply and distribution of content at ITV is a logistics business. Bringing together a number of systems, teams and workflows, the Content Supply Chain is responsible for bringing up to five terabytes’ worth of content every day into ITV and distributing it to consumers. As with any logistics business, technology plays a key role: automation helps us to receive, process and deliver content efficiently, while data and analytics present opportunities to optimise workflows and extract more value from the supply chain itself.
Small, composable components are well-suited to this domain, and insights can be derived by taking an event-driven approach to their interaction. For example, by comparing the timestamps of different events in a particular workflow, we can measure its performance and identify any bottlenecks that it might contain. Similarly, if an expected event does not arrive, alerts can be raised so that the underlying issues get resolved before causing problems.
Adopting Event-Driven Architecture
An example workflow from the content supply chain is presented in figure 1, annotated with some of the events that occur as it executes. These follow naturally from the steps in the workflow: there is a single event to indicate that a step has begun, and separate events for the possible outcomes.
Using events from the business domain to influence a system’s architecture yields a number of benefits. By delimiting the actions of different components, events can help enforce the Single Responsibility Principle and ensure that domain boundaries are reflected appropriately within the system.
Reusability, too, is an important property of supply chain software: workflows can be modelled as different wirings of components and being able to share steps between them is useful. Decoupling the steps in a workflow helps achieve this; in architectural terms, we design components that react to events rather than exchange commands or rely on a central brain to orchestrate control flow.
In refactoring parts of our system to this style, we’ve found that the risk of breaking existing functionality can be offset by first extending components’ behaviour to include just the publishing of events, introducing consumers to replace their interactions later. This approach also makes data for reporting and analytics available sooner.
Identifying and Modelling Events
Identifying domain events may seem like a trivial exercise from the previous example, but given their key role, it is a process worth paying attention to. We’ve recently begun using Event Storming workshops to identify important events across our workflows. Getting input from Domain Experts is essential here and helps establish a common language between product teams and their stakeholders. Since so much waste in Software Engineering arises from ambiguity in some form or another, the value in designing systems from a verified understanding cannot be overstated.
Two further principles for modelling events are worth mentioning at this point:
1) Name Events Carefully
Events in software reflect something that has happened, so they should always be named in the past tense. This may sound pernickety, but consider these possible event names:
- DATA VERIFIED
- DATA VERIFICATION
From the first name, we can tell that the verification step has completed, and could use the timestamp of this event to infer how long that step took. With the second, it is not clear what state the verification is in, and the system becomes more difficult to reason about as a result. It is also worth considering how consumers will decide what to do when they receive a particular event. Descriptive names can save consumers having to pull extra information out of a message payload or make API calls to determine what has happened.
2) Simplify Event Consumers With Generic Metadata
In an event-driven system, the publisher of an event shouldn’t need to be aware of the components reacting to it. If events are sent via messaging, the structure of those messages should be predictable so that components can handle new events as they are introduced. This also makes it easier to query the events from a data store, using them to build up a picture of the system’s behaviour. Of course, some details may be specific to particular event types, but using generic fields (event name, timestamp, etc.) consistently can help here. Depending on the underlying technology, encoding this information in message headers might help reduce the impact on existing consumers.
An Event Collector Service
In this example, Service B reacts to events published by Service A. Routing logic in the message broker ensures that these events are forwarded to the Event Collector service. Using a common format for event metadata means that the Event Collector can serialize these messages and upload them to different data stores for further analysis.
As well as supporting AMQP (via RabbitMQ), the Event Collector also has an HTTPS endpoint so that third-party systems can publish events via a webhook. Likewise, a lambda function forwards events from an SQS queue, providing an interface for serverless workflows.
Supporting A Smarter Business
The log of events generated every time a workflow executes provides users with a single source of truth for the supply chain’s behaviour. We’ve found transforming this data using SQL Views in BigQuery an effective way to expose this data to different stakeholders. The ability to join this data to other sources is a useful feature of this approach. For example, metrics from the event log can be enriched with catalogue data to provide insight into how long content for a particular production takes to process.
Adopting an event-based approach for our content ingest workflows proved its worth recently during a critical project to bring a number of processing steps in-house. Splunk dashboards were used to rapidly prototype a user interface for tracking content coming into ITV. These enabled our operations teams to diagnose issues with missing content quickly and make better predictions about when content would be available.
Product Owners have also found this data valuable. Measuring the impact of new features and prioritising future work both become easier when the relevant metrics are readily available.
Perhaps the greatest value in having collected this data, however, is that it is now accessible and intelligible to other parts of the business. Breaking down silos of knowledge is a challenge for any organisation, and if taking an event-driven approach can help achieve this, then it’s surely worth considering.
Dashboards and alerts are useful tools, but the goal is to improve the supply and distribution of our content. While equipping people with metrics has proved valuable, they are not the only consumers that can benefit from this information.
Feeding analysis back into the system itself presents an opportunity for components to adapt automatically to changes across the supply chain: heuristic algorithms might infer whether a piece of content is likely to arrive on time, and different parts of the system could be configured to scale up in response to this information. The work described in this post only facilitates a more scientific approach to the business domain; connecting the insights, taking informed action and then iterating on this is where the value really emerges.