Event-sourcing at Nordstrom: Part 2

Rob Gruhl
8 min readMay 28, 2019

In part one, I shared how at Nordstrom we’ve been exploring and implementing event-sourcing as an architectural pattern. In part two we’ll be sharing some of the common producer patterns we’ve seen.

You can think of event producers as the story-tellers of your business: They make decisions about or observe activity in the business, and notify the event stream about what just happened. Some event producers first consume other events, make decisions, and then publish those decisions: You could call these contributing consumers. An event-sourced system is only as good as the quality of the events produced and the reliability of the producers, so spending effort on stand-alone producers decoupled from the rest of the system is time well spent.

At Nordstrom, our events are Avro-serialized, which provides compression, documentation, standardization, and some forward and backward compatibility across schema revisions. Each event shares a common header, and complies with an eventing standard. These events are written to a shared, multi-tenant Kafka cluster, and the event schema is registered with a central schema registry. In our standard, events are notifications of things that occurred in the past — as opposed to commands to take action on in the future. So for example, “add to cart” is not event; instead, it’s “added to cart”.

Nine event producer architectural patterns

Within an existing system, there are lots of places we can find events to produce, and lots of mechanisms to choose from. Here are some of the common patterns we’ve seen and implemented at Nordstrom. Hopefully these examples inspire discussion and innovation in your teams!

1. Produce directly to the ledger at the moment of the event

In a greenfield project where you’re using event sourcing as a first class architectural pattern, this may be the right pattern. In your system, at the moment the event state is observed or decided, an event object is produced to the event stream. In some systems a single component — like a serverless function — might be nothing more than an event producer, with the results of that event being handled elsewhere by one or more separate event consumers. The benefit of this approach is that the producer team can focus on reliably producing events with strong guarantees, instead of implementing other logic. In this pattern, every effort should be made to ensure that events are delivered with an at-least-once guarantee to the event stream, even under a variety of conditions.

2. Transform an existing stream

You may have an existing stream that’s low quality — perhaps it contains a lot of implementation specific technical context, duplicate data, or command syntax. The source stream may also be a non-durable queue, or have poor fan-out or back-pressure issues, or any other number of issues. Using one of a variety of stream processing approaches, you can transform this original “legacy” stream into a high-quality ledger source.

3. Write-through a database and use change data capture

This write-through producer pattern can be a very practical approach for existing systems that want to handle transactional sets or have strong read-after-write requirements. In these systems, interactions with a database are detected by consuming the change data capture stream from the database, then transformed into events on the stream. Some cloud providers have database solutions especially designed for this pattern, like a NoSQL database with a stream you can subscribe serverless functions to. Having a real-time, low latency stream of changes may even be a deciding factor when you select a database. For example, as of this writing, Google Cloud Spanner has some truly incredible capabilities, but doesn’t support a good solution for streaming egress of ordered events.

This pattern can also be used as a filter or capacitor for noisy, partial, and/or poor quality events. The individual noisy events drive new row creation or updates into rows on the table, possibly with conditional writes to remove duplicates. Then the change stream of the table can be read, and if complete, transformed into high quality events and published onto the event stream.

4. Poll an existing request/response service for changes and publish

Many existing synchronous request/response services support getting recent changes. A simple way to turn this into an event stream is with a constantly-polling producer. The polling system may also need to do additional work to serialize and schematize the data. In our experience, it’s important to test the staleness as well as completeness of data coming off of these service APIs, especially when evaluating a third party service provider’s brand-new APIs. Some systems use batching or scaling systems behind the scenes that can introduce unacceptable staleness to the data. A simple synthetic test suite should be able to easily measure these issues under a variety of conditions.

5. Subscribe to a web socket or pub/sub

Many systems have pub/sub message buses or WebSockets that support subscribing to changes. As many pub/sub systems don’t provide a durability guarantee, care must be given to understanding how events will be recovered if the subscribe and transform component becomes temporarily unavailable. One solution here is to request that the source system re-drive the data through the pub/sub layer.

6. Streamify batches of data

Before the relatively recent trend of real-time data streams as the first source of truth, many systems relied on batches of data delivered in regular batches. In an event sourced system, these batches can be immediately turned into streams of events as soon as they’re received. However, care should be taken to try and preserve ordering of the data where possible, if the batches are not well-ordered.

One benefit of this pattern is that with a scalable consumer system, along with the natural decoupling offered by a stream transport like Kafka or Kinesis, it is often perfectly fine to have hourly or daily massive pulses of events. Additionally, this opens the opportunity for the producer system to evolve over time without changing the consumers, perhaps moving to 10 or 15-minute batches, or even to a relatively real-time stream as its next refactor goal.

7. Use event triggers in your serverless ecosystems

Serverless compute offerings rely on an ecosystem of event triggers throughout their cloud provider. Serverless functions are automatically invoked when any of a variety of events occur, like an object created in a bucket, or a message published to Pub Sub. This serverless event trigger ecosystem can be leveraged to cause events to be published to the stream every time something occurs in the world. In many cases, a serverless event producer function is invoked, receives the payload, transforms it, and publishes it to the event stream.

8. Watch for event-worthy actions in react/redux

Redux image from https://itnext.io/integrating-semantic-ui-modal-with-redux-4df36abb755c

When talking to different engineers from different backgrounds, the idea of an event-sourced architecture might be completely foreign or totally comfortable to them. Among those I’ve spoken with that are most comfortable with the idea are front-end engineers familiar with frameworks like React-Redux. In these front-end systems actions are published as they occur (button clicked, image swiped) and a variety of components on the page consume and respond to those actions in different ways.

Leveraging these type of front-end action/event ecosystems in the browser, we can create components on the page or client app responsible for listening for specific actions and creating and broadcasting well-formed events. Whether the event is serialized into its final format on the client or passed to the cloud in some generic format is an important decision.

9. Replicated table from an occasionally offline client

In cases where the event producer is only occasionally connected, being able to first reliably write events to a local store which is later synchronized to the cloud may be a great option. Examples of these cases include IoT devices, a mobile phone, or a client in an unreliable network environment, or a device that moves through environments with incomplete WiFi coverage.

Many different options exist for this pattern. As of this writing, we’re exploring several of them in proof-of-concept projects. This is an exciting pattern which may provide exceptional data fidelity guarantees and prove very robust in challenging environments.

Next up? Event Producer Tips & Tricks.

In the next part of this series, we’ll take a deeper look at some tips, tricks, and common challenges we’ve observed with event producers including how to respect and protect the privacy of our customers.

To learn more about what we’re doing with event-sourced architectures at Nordstrom, check out this article by A Cloud Guru on event-sourcing at Nordstrom; this one on our award-winning open sourced serverless event-source project “Hello, Retail!”; and Rob Gruhl’s talk at the Emit conference.

To learn more about event-sourced architecture in general, I’d recommend Martin Fowler’s talk, Martin Kleppmann’s short ebook and full book, Greg Young’s unfinished ebook, and Jay Krepp’s blogs.

If you’re currently implementing something using event-sourcing, or just have thoughts, questions, or idea, please leave a comment — We’d love to chat!