Adobe I/O Events enables developers to create applications that can subscribe to event notifications from Adobe’s products and services. Today, a developer can subscribe to event notifications from services such as Adobe XD, Adobe Analytics Triggers, Adobe Campaign and many more.
Our developers have built integrations that allow comment notifications on an XD prototype to appear right in Slack. On the other hand, our enterprise customers are updating their CRM with data from the latest email campaign in near real-time. The possibilities are truly limitless, all thanks to the power of Adobe I/O Events.
Today, I/O Events offers client applications the ability to not just receive the events via a Webhook Push, but also to pull the events via the Journaling API. This pull-based API works on the concept of a Journal which is quite similar to a singly-linked list. The way a client application reads events (fetch events → get a pointer to the next event → fetch the next set of events) it can only traverse the list of events in a single direction. Furthermore, newer events are only written (appended) at the end of the list. Hence, the Journaling API can easily be thought of as a linked list — but distributed.
What is a distributed linked list?
A distributed linked list, as I define it, is like any other linked list: each node in the list contains data and a pointer to the next node. However, the nodes in the list are stored over multiple machines for durability and to allow reading at scale. Also, multiple compute instances write concurrently to a list for both resiliency and to write at scale.
While the Journaling API’s functionality was seemingly simple, building Journals and operating them at a scale of several million events per hour with more than a few thousand event subscriptions proved to be a challenging task, as we learned over a year and a half. During this time, we encountered multiple scalability and cost issues which ultimately caused us to re-implement the system twice after the initial implementation.
Our very first implementation of a Journal was simplistic — we stored all our events in a document database (Azure Cosmos DB), we ordered them by an auto-increment ID and we partitioned the data in the database by the subscription ID. Writing an event was a simple database insert, whereas, reading events required a lookup followed by an ORDER BY query.
Well, luckily for us we did not have any real usage of this system at that time. Because once we started testing the system against some load, we discovered that not only was reading events super expensive, we could not support more than a dozen clients reading events in parallel.
Before we could begin to resolve these issues, we were hit by an outage — a single event subscription published so many events in a day that it filled out the 10 GB physical partition limit on Azure Cosmos DB.
Fearing another outage, our second implementation was a fast-follow on the first one, but we had learned a few things from our initial implementation —
- Storing the event payload in the database was a bad idea — Not only did we have the physical partition size limit to worry about. But also, the size of a single event ranged anywhere between a KB to a 100 KB, and reading several hundred KBs of data out of the database per client request was *very* expensive.
- Subscription ID as the choice of partitioning key was again not a great decision — it was very easy to cause hot partitions, making an already expensive system to run even more inefficiently.
- We started to RTFM — the partition size limitation hit us out of nowhere. Apart from that, Cosmos DB’s pricing is based on reserved capacity in terms of a strange unit “Request Units/Second”. We had no way of telling how much we’ll need, much less being able to provision it beforehand.
Our second implementation was very similar to the first one, only the event payloads were moved out to an object storage (AWS S3). This time around we did read the documentation and we knew that unless we batch our writes the storage would cost us a lot of money — and hence we batched the event writes together.
However, the system was still very expensive to run. Our costs to read events had simply shifted from the database to the Object Storage even though we were batching the event writes. And although scalability had improved, the database continued to be a chokepoint for reading events.
We realized a couple of things —
- While moving the event payload to an Object storage seemed to be a step in the right direction, the way we were batching our event writes was not working out. To serve a single read request we had to first do a row lookup, then run an ORDER BY query and then finally fetch up to a 100 S3 objects. By no means was this cheap and to fix it we needed to batch in a way that facilitates event reads.
- We also realized that databases were not the right fit for our use case, we only inserted records, never updated or deleted any of them. Older records were garbage collected based on a TTL policy. Moreover, even after moving the event payloads out, our database continued to be a chokepoint while reading events. We needed to find a way to eliminate any database interaction from the critical path of reading events.
On the bright side, not all decisions that we had made were bad ones. Firstly, we had steered clear of ordering the events by timestamp and instead used a logical auto-increment ID. Secondly, we had bet our system on a managed database service, and in the worst case, we were always able to throw money at our scalability problems to buy us some time.
The second part to this blog post will cover how we arrived at our current system based on all the learnings we had so far. We will also talk about how not only our system requirements inspired the underlying technology we used but also how our system design was inspired by our technology choices.