Distributed event log with CouchDB
Multi-datacenter deployments, event sourcing and CouchDB
A quick disclaimer to start
I’m quite new to event sourcing and this post is in no way intended to be authoritative. It’s rather a snapshot of my current work and thoughts about this highly interesting topic. I may be plain wrong on certain aspects (hopefully not all of them), so please feel free to reach out and discuss!
The context
As part of my consulting activities, I have the great opportunity to work for a French start-up building a mobile, location-based social network. I’m in charge of the back-end built on top of Windows Azure, with a first version already up and running.
Now as the team is preparing the public launch of the app, I’m doing preparatory work on the next iteration. Among the requirements that will have to be taken into account is the ability to deploy the back-end across multiple data-centers to bring it closer to worldwide users.
You may guess that one of the biggest challenges in multi-DC deployments is data management. Keeping your data consistent and synchronised across regions is a tricky business. If you don’t really care about response time and/or availability, you can just plug each region to the same physical database, but it kind of defeats the purpose; you would have a single point of failure for all your endpoints and an obvious performance bottleneck. It makes much more sense to rely on replicated data stores and frankly, the challenge is way more interesting!
So what we want to achieve is some kind of replication that eventually synchronises our ever-changing data model across all regions. An interesting approach to replicate state changes is to dispatch the incremental changes (or deltas) that are applied to each data store, and have other stores apply the same changes to their model.
Event sourcing to the rescue
I will not get into the details of what event sourcing is and what it is good for here; in short:
- when updating your data model, you don’t overwrite state but only log what change you intend to apply in the form of events (like userCreated, userUpdated etc.)
- some process listens for new events arriving in that log and incrementally applies the changes to a data repository
- this data repository could be anything from a Redis cache to a Neo4j graph database, as long as it suits well the kind of queries that the system may issue in order to retrieve data
Processing incoming log events should obviously happen in an asynchronous way; you don’t want to block the clients until their writes end up in the data repository. The “downside” of such a two-level system is that your data layer is eventually consistent, which can be totally acceptable in certain business contexts but impractical in others. More on that later.
Personally, I think that event sourcing brings an impressive level of flexibility and really transforms the way you consider data management in your systems. By storing your data updates in the most canonical form, you have the freedom to rebuild your data repository in any shape / schema / model that may get relevant later down the road.
It’s also interesting to view the event log as your system’s spinal data source, from which you can feed a collection of repositories like a relational database, a search engine, a cache, etc. But that’s beyond the scope of this article.
Replicating event logs across regions
Once our data management relies on an event log, we can consider spreading our back-end system across different regions by replicating each region’s log to the other ones. The goal here is to make sure that each region’s data updates get eventually written in every other one so that all events eventually appear in all regions.
As shown in the drawing above, each region implements the event sourcing principle described previously and the replication is handled at the event log level. In such a setup, regions don’t even need to be aware of each other!
CouchDB as a replicated event log
Those of you who don’t know CouchDB should really take some time to try it out. It is far from being the most popular NoSQL engine out there, most probably because it handles secondary indices through a Javascript map/reduce pipeline, which may seem inappropriate for most developers… The truth is, this mechanism is incredibly powerful and brings a lot of flexibility. CouchDB is now 10 years old and arguably one of the most solid NoSQL product currently on the market.
Some of CouchDB’s features make it particularly suitable to support our event log:
- It exposes a changes API that enables clients to subscribe to any change happening in the database. Even cooler, we can request to have these changes pushed to the client so we don’t have to poll the server.
- It has a powerful replication engine built-in that enables you to replicate databases within the same server or across different servers. This is a super reliable feature that can run continuously and, particularly interesting for us, can handle master-master replications (meaning, documents can be written to all databases being part of the replication scheme, not just a single master one).
You may think that master-master replication sounds nice but what about conflicts? Even though CouchDB does handle conflicting writes by signalling them to the clients, we won’t need it because we will only append new documents to our event log, never update or delete them.
So setting up your distributed back-end with CouchDB as the event log is pretty straightforward:
- Focus on how each region will run locally by routing your data updates to to CouchDB, have some worker process subscribing to CouchDB’s changes API and inject data updates to the data repository of your choice.
- Duplicate this setup to a second region and configure two-way replication between the CouchDB databases that host the event log. Each region’s worker process will then get notified when new events get appended to the log, either locally or from a different region.
But hey, that’s how distributed databases already work!
Indeed, you may be aware that databases that handle replication implement this in a way that is very similar to what I’ve been describing so far: they log data changes and it’s these changes, not data state, that are being replicated and applied across nodes. You may then wonder why we would bother reinventing the wheel here…
In some cases (maybe even in most of them), you would be fine relying on the native replication offered by products like MongoDB or Neo4j. But handling this at your application’s level offers the following advantages:
- Your system may rely on polyglot persistence and store relational data in a SQL database, documents in a search engine for full-text or geospatial queries, some entities in a cache for quick retrieval… If so, using your own central event log makes it way easier to update all those repositories upon data updates.
- Most NoSQL products offer master-slave replication, which may not be suitable in a multi-DC setup if you expect high write availability (because you would always need to route your writes to the master)
What about Kafka?
Apache Kafka is a log system whose purpose is precisely to act as a distributed event log platform. I never tried it myself but I have read excellent feedback from people using it at scale.
So why CouchDB and not Kafka? Well, Kafka is written in Java, which is quite far from my technical background, and spawning a Kafka setup (which relies on ZooKeeper for clustering) can seem daunting for the .NET guy that I am. Also, my recent good experience with CouchDB and my feeling that it would nicely fit the bill motivated me to take that path.
Gotchas
This is not a magic recipe, far from it. Although it fits very well the particular project I’m working on (that social network platform), it does not suit all kinds of back-end system that you want to spread across remote regions. Below is a list of things to be aware of before considering giving a try to such an approach.
It’s (very) eventually consistent
Eventual consistency is something that we have to handle with systems that we expect to be both distributed and highly available, and this approach is no exception. Some time can and will pass between an event being appended to the log and being applied to your data repository.
So how do you deal with situations where pre-conditions have to be verified before validating writes? What if you have to enforce unique email addresses among your users, and a conflicting registration event is still waiting in the log — so invisible from the data repository — when you’re processing a new registration?
That’s a very current topic in the field of distributed computing and there is obviously no unique answer to that. The approach that I found most sensible is to “embrace eventual consistency”, let this kind of things happen and implement recovering actions if they do. Blocking your system to perform preliminary checks and prevent issues that are unlikely to happen often is not the best way to increase availability at scale. If you detect that an existing user with the same email address exists in your data repository when injecting a registration event, maybe you can just merge both accounts into one. Or maybe you can request some user action by sending him an email automatically. It really depends on your business rules here.
You can’t control the ordering of log events
An other (and maybe trickier) thing to consider is that the ordering in which events will show up in each region’s log is not only nondeterministic, but will also vary between regions. Here again, this is a constraint that not all business contexts are able to cope with; if yours do, you have to make sure that the way you implement the injection of log events in your data repository can handle this nondeterminism gracefully.
Surprisingly in my case, handling this apparent mess is actually very doable. Let’s take a practical example in that social network project with a friend sharing a post with you. What would happen if you add a comment to that post and, at approximately the same moment and on the other side of the globe, your friend deletes his post? Because of the asynchronous log replication between regions, these 2 events (commentAdded and postDeleted) could show up in the log in any order. The trick here is to swallow any inconsistency and make sure your data remains consistent after each repository update. So the way you would implement these 2 events would be:
- on commentAdded: add comment to post if post exists
- on postDeleted: delete post and all related comments
Whatever is the order in which these 2 events appear, you are certain that your data will be eventually consistent after both have been processed: there will be no post and no related comment.
Going down that way, you will realise that the impact of such an approach is that you will, most of the time, respond OK to your client’s write requests. Trying to add a comment to a post that does not exist? OK. That post most likely existed recently, so it’s probably OK that your client thinks it has successfully written its comment. It will figure out that both the post and the comment have disappeared the next time it updates; the important thing here is that its view is always consistent.
Edit (Feb 2017): if you’re interested in the (formal and theoretical) principles that can be applied to make eventually consistent streams of events converge the same way, check this video from Martin Kleppmann.
Following up
This article is very high-level and I will likely post follow-ups as I progress on the implementation of this system. Please feel free to reach out for any question, remark or suggestion.
Some pointers to related stuff I’m reading these days:
- Jay Kreps’ seminal post about logs
- Martin Kleppmann’s “Designing Data-Intensive Applications” book
- Confluent’s blog