A Reactive Event Sourced Inventory Solution using AWS Serverless:

Achieve incredibly low latencies at any scale with properly architected AWS serverless offerings

The Agile Monkeys

Published in

The Agile Monkeys’ Journey

13 min readDec 28, 2018

Inventory, Event Sourcing, and Serverless

First, a bit of background on the touchpoints of this article.

Serverless

Serverless is both an industry buzzword and a category of hosted solutions that you’re likely already using if you are hosting in a public cloud. Serverless architecture is a design pattern where applications are made up of third-party services which eliminate the need for infrastructure management. It encompasses both Platform as a Service offerings (Sometimes called BaaS or Backend as a Service) as well as Function as a Service products like Lambda. AWS is the furthest along of any of the cloud providers in serverless technologies and offers a full complement of solutions including:

API Gateway (Proxy)
S3 (Storage)
Cognito (Authentication)
Lambda (Compute)
DynamoDB (Data Persistence)
SNS (Messaging)
Kinesis (Streaming)
AppSync (BaaS)
Step Functions (Workflows)

Again, it’s likely that you’re familiar with at least some of these technologies and it’s also likely that you never classified them as serverless… but that’s exactly what they are!

Inventory

One of the most important aspects of any e-commerce store is proper management of inventory. There isn’t a worse customer experience than allowing a user to add an item to their cart, checkout, pay, and then have to tell them that the item, in fact, isn’t really available.

Event Sourcing

Event sourcing is a way of storing data where all changes to the system are captured as a series of immutable events. Nothing is ever modified or deleted in our source of truth. Martin Fowler has an excellent blog post on this pattern (Martin Fowler has an excellent blog post on most things).

E-Commerce Inventory Management as a Service… such an easy solve

Luckily this is a very easy problem to solve. You simply keep a table in your database called “inventory”. When a user tries to add an item to their cart, you make sure that the inventory for that item is non-zero. When they checkout, you decrement the inventory by one. The next user that comes to the site triggers the same check and if that previous user bought the last one then the item shows as sold out. Something like this:

Problem solved!

But wait.

The above works great when our shoppers are shopping serially. But that isn’t how shoppers shop :(. In a more realistic scenario we have the following:

So here both users add the item to their cart, attempt to check out, and then depending on our implementation either have a race condition where the first checkout wins or are both allowed to checkout and then subsequently one has their order cancelled. Not good.

Okay, but we can solve this! So instead of naively assuming we will have the inventory we originally retrieved indefinitely, we can “reserve” the inventory when it’s added to our cart. This involves some extra calls and extra processing and looks something like this:

Okay so now User 1 reserves inventory when it’s added to their cart and User 2 is properly told there’s none available. That’s great from a functional requirements perspective but now we’re running into some technical considerations. If we rely on the database to make sure that these operations are transactional then we need a strongly consistent datasource with the ability to lock rows in a table. This is fine for small loads but if we have a hot item in our store that a bunch of users is trying to buy simultaneously or a lot of traffic in general, then we’re going to start running into performance issues. Imagine it’s Cyber Monday and we have a great price on the latest video game system. We may get a hundred or even a thousand simultaneous attempts to reserve and purchase the same item. If we’re strongly consistent across our datasource then it’s going to be impossible to handle that kind of traffic (believe me… I’ve seen it!).

Not such a simple problem… so what can we do to solve it!?

One approach is to design an event sourced solution to the problem. The idea here is that everything besides the reservation itself will be handled asynchronously as a series of messages that are either commands or events. Additionally, we won’t worry about calculating the final state of our inventory as we process these messages, we’ll just record them as they come in. We’ll go into all the details and consideration below but here’s a diagram of our application:

Okay, there’s a lot here to unpack… so let’s go from top left to bottom right.

Modifying our inventory manually or when new shipments arrive

Our e-commerce store has a warehouse somewhere (either ours or someone else’s) and as the stock shows up in this warehouse we need to adjust our inventory. We also may do an audit from time to time or have returns, etc, and based on that an admin may want to adjust the inventory manually through a UI. Here’s is the first instance where we hit on two themes that we’ll return to again. These two themes are the reason we’ve made the design choices for our system that we have and are important to understand immediately:

1. The most important function in our store is selling things to as many customers as possible.

2. Life isn’t strongly consistent, so we don’t need to be either.

Let’s look at these two statements one at a time. The first tells us that since the most important thing we can do is successfully sell things to as many customers as possible that everything else is LESS important and should not get in the way of selling things to customers.

The second can best be illustrated with an example. Let’s say you’re at a coffee shop and you buy a coffee. This is a fancy coffee shop, the kind I like (you should like it too… good coffee is one of life’s pleasures). The way these places typically work is you place your order, they charge your credit card, and then they make your coffee. But let’s say that after you place your order and pay they suddenly run out of milk! Now, this hardly ever happens. I’m not sure it ever has to me. But it could.

And how could this happen? Because the coffee shop isn’t strongly consistent. They don’t go and make your coffee and hand it to you at the moment you pay. Why? That wouldn’t be efficient! So why are developers like us expected to make our e-commerce sites strongly consistent!? Are we better than the coffee shop? NO! So given that we care less about recording admin updates and warehouse inventory additions than we do about selling stuff and given that we don’t need to be strongly consistent, we’ve designed our flow for this operation as follows:

Admin/warehouse worker enters change into UI.
UI calls back-end and requests a change to be made.
Back-end persists command to change inventory to a durable queue with appropriate semantics (at least once in our case).
Back-end responds to UI and confirms request has been received.
UI tells user that request has been received.
Another process picks up the message off of our queue.
It processes the request and writes an event to the event stream for that particular item indicating a change to the inventory for that item.

So what are the implications here? Well one, that this isn’t slowing down our customers from checking out. And two, that these changes aren’t being applied synchronously (which is what allows one to happen). Now is a good time to discuss SLAs and latency. If it took a day to process these inventory change commands and write inventory events then we’d probably run into issues. We’d likely be selling items to our customers that we no longer had in inventory (or telling them that things were sold out when they weren’t). This wouldn’t be acceptable and the system architect (yours truly) would be fired. So when we design a system like this we need to understand the acceptable latencies our business needs to meet and establish SLAs that we need to keep to based on these acceptable latencies. Then we need to look at the latencies of the underlying technologies, the latencies incurred by the design, and whether everything lines up to meet these SLAs. In our case, the latencies should be on the order of 10’s of milliseconds and shouldn’t result in negative impacts to our business.

Reservations

Now we get into the heart of our system… our reservations. These are promises to our customers that we have an item to sell them and as such we need to ensure that we are very careful with them. Given that, this is the one area of the system where we synchronously process requests. The process is as follows:

A user attempts to add an item to their cart.
The UI calls the back end to perform the add operation.
The back end pulls the events for this item and calculates the current level of inventory.
If there is enough inventory to fulfill the requested “add to cart” it synchronously records an event that decrements the inventory as a reservation and returns success.
Otherwise, it returns failure.

So that’s it! BUT how do we ensure that two users don’t reserve inventory at the same time, similar to what happened in our naive implementation? We use optimistic concurrency! The basic idea is that we don’t lock our database. We simply check when we attempt to write an event that the last event we read is still the last event persisted. So for example:

We read the events for item #123.
We get back 10 events so the last event is #10.
We create event #11 and attempt to write it to our datasource.
If the last event is still #10 we succeed.
If not we fail and try again.

Depending on the datasource and the design of our persistence there are many ways to implement this. But the important thing here is to ensure that we don’t suffer from too much contention. That’s because every time we have a collision resulting in step #5 above we have to retrieve all of the data and try again. And I know what you’re saying… but this is what would happen on Cyber Monday! We’re screwed. But the beauty here is that using the technologies, scaling, and optimizations we’ll describe later and the design we’ve illustrated above, we can perform this operation in 10–20ms. So even if we’re taking 1000 orders per minute for the same item we can still handle the concurrency with minimal contention and added latency.

Checkout

With our design, checkout is simple! On successful checkout we simply write a command to our queue to make the reservation for this user permanent. Once it’s processed our reservation becomes an inventory decrement… simple!

Abandoned Carts

Abandoned carts are also quite easy. Now is a good time to explain this part of our architecture, which is the publishing of changes to our inventory to other parts of the system. When a reservation is made, we persist it to our datastore. Our datastore, in turn, publishes a message to a message bus broadcasting the change. Other processes consume and react to these messages in various ways. When reservations events are broadcast we have a process that consumes them and creates a timer that allows them to be reserved for a period of time. The exact mechanism is an implementation detail for the purposes of this post. But suffice to say that when the timer goes off, the reservation is considered expired. At that point, if checkout hasn’t occurred then an event is written to the event stream negating this reservation and incrementing the available inventory appropriately.

Search Results and Product Detail Pages

This is a pretty important one but is also incredibly simple. Similar to the abandoned cart process, there is a consumer of all inventory events that react to changes. This consumer projects the changes to inventory data to another datastore that is read optimized. Again, it isn’t strongly consistent with the state of the world (or even our source of truth) but it is eventually consistent and the latencies are low enough that user experience should still be almost perfect on this front. When a user searches for items or looks at the details of a particular item, it is this source of data that is queried for inventory details. This removes the load on our source of truth and allows us to achieve the latencies there that we desire for our reservations.

Scaling

We’ll get into the rest of the pieces of the diagram below as they are optimizations. But there is an important detail we need to focus on first. And that’s how we scale our system as we increase users and items in our store. The short answer here is that we do horizontal partitioning everywhere by a partition key, in this case, a hashed item identifier. Using this we can scale out:

Message bus topics and queues- Partitioning our topics and queues allow us to have all of the messages for a given item in the same topic or queue while allowing for overall parallel processing of that topic/queue.
Message consumer pools- This dovetails on the above and allows us to consume messages in parallel. So if we split a message topic into 64 partitions then we can have 64 single threaded consumers processing messages in parallel without having to worry about contention since all of the messages for a given item will be processed serially.
Message consumer thread pools- Using various strategies that amount to grouping by our item identifier, we can ensure that within a given consumer we can parallelize processing of messages with multiple threads and avoid contention.
Datastores- By sharding our datastore on the partition key we can infinitely scale the number of datastore instances we have and still ensure that all of the data for a given item can be located and retrieved from a single instance. Again implementation here is outside the scope of this article but depending on the datastore this may require code in the application or may come out of the box.

Optimizations

Snapshotting

Rather than reading all of the events and folding them into the current state of an item every time we want to make a reservation, we can periodically snapshot the events. In the diagram above, you can see that this is done by another consumer listening to change events emitted from our datastore. When a new event is emitted, the previous snapshot is retrieved, this event is folded on top, and the new snapshot is written back to the datastore. Doing this asynchronous to writing other events allows us to perform those operations more quickly. And having these snapshots allow us the make reservations more quickly since we need only find the last snapshot and apply any subsequent events on top of it rather than starting from the beginning of the event stream every time. Typically, when making reservations we’ll assume that the latest snapshot is, in fact, the current source of truth and let optimistic concurrency take care of the few cases where this isn’t so. Again, this is all in the interest of making the vast majority of our reservations as quickly as possible.

Cold Storage

Storing all of our events for all time in a highly performant datastore will get very expensive very quickly (and slow down our retrievals and consequently our all-important reservations). So the solution here is to TTL them in our primary datastore and write them to cheaper/slower cold storage for permanent storage. Again we consume event changes from the primary data store and write the data to our cold storage asynchronously.

High Inventory Items

This is a very obvious optimization that requires very little technical work. But it’s still worth mentioning here. The idea is that if we have some large number (like 100) of an item in inventory then we don’t need to bother checking if any have been reserved in the last few seconds. That’s because it’s highly unlikely that they’ve all been reserved. There are more sophisticated, probabilistic algorithms we could use here taking into account previous purchasing history, etc. But you get the basic idea.

Technologies

Here’s where we finally mention our current favorite serverless technologies in AWS. These technologies dovetail very nicely with the components in our system and greatly reduce the development time and infrastructure requirements of our system. They include:

AppSync- An incredibly simple to set up PaaS API that will handle our reservation requests and read access to our inventory.
DynamoDB- A fully hosted, dynamically scalable, replicated, and partitioned data store.
Bonus- here’s an easy way to do optimistic concurrency.
DynamoDB Streams- How we’ll publish our change events when persisting to DynamoDB.
Kinesis- Our partitionable, fully hosted messaging solution.
Lambda- Our FaaS message consumer solution. We just write the consumption code. AWS gives us hosting, connectivity to Kinesis, dynamic scaling, logging, alerting, etc.
Glacier- Our cold storage.

Security, Monitoring/Visualization, and Toolkits- Third-Party Solutions to fill in the gaps

AWS continues to increase their offerings in the serverless space but there are still gaps we prefer to fill with third-party solutions. A few of the best at the moment:

Serverless Framework- Open-source CLI for building and deploying serverless applications.
Epsagon- Distributed tracing that helps you monitor and troubleshoot your serverless application.
Stackery- Drag and drop serverless toolkit.
Puresec- End-to-end security for serverless applications.

Ta Daaaa!

And there you have it. Inventory as a service that’s dynamic, reactive, scalable, and hosted. What’s not to love!?