Scalable CouchDB Replication and Change Listening with Spiegel

Geoff Cox
Geoff Cox
Jan 8, 2018 · 4 min read

Replication Challenges in CouchDB

Scalable Replication

The _replicator database in CouchDB is a powerful tool, but in many cases it does not scale well. Consider the example where we have users posting blog entries. Let’s assume that we want to use PouchDB to sync data between the client and CouchDB. Let’s also assume a design of a DB per user and an all_blog_posts database that stores the blog posts from all the users. Having a database per user will allow us to restrict access to the user databases so that only the owner of a post can edit her or his posts. In this design, we’d want to replicate all our user databases to the all_blog_posts database. At first glance, the obvious choice would be to use the _replicator database to perform these replications, but the big gotcha is that continuous replications via the _replicator database require a dedicated database connection. Therefore, if we had 10,000 users then we would need 10,000 concurrent database connections for these replications, even though at any given time there may be at most 100 users making changes to their posts simultaneously. With Spiegel, we can prevent this greedy use of resources by only replicating databases when a change occurs.

Real-Time Replication Between Clusters

While CouchDB 2 has built-in clustering, one limitation is that this clustering isn’t designed to be used across regions or data centers. Spiegel tracks changes in real-time and then only schedules replications for databases that have changed. You can therefore use Spiegel to efficiently keep clusters located in different regions of the world in sync.

Scalable Change Listening

Let’s assume that we have some routine that we want to run whenever there are changes, e.g. we want to calculate metrics using a series of views and then store these metrics in a database doc for quick retrieval later. We’d need to write a lot of boilerplate code to listen to _changes feeds for many databases, handle fault tolerance, and support true scalability. Instead, we can define a custom REST API endpoint that calculates these metrics and then a Spiegel on_change rule that will call this endpoint whenever there are applicable changes.

How Spiegel Scales

Spiegel is comprised of three types of processes: the update-listener, change-listener, and replicator. The update-listener listens to the _global_changes database and then schedules on_change rules and replications accordingly. The change-listener runs on_change rules for all matching changes. The replicator performs replications.

Diagram of Spiegel’s update-listener, change-listener, and replicator processes.
Geoff Cox presents “Scalable CouchDB Replication and Change Listening with Spiegel” at Offline Camp Oregon, November 2017

About the Author

Geoff Cox is the Co-Founder of Quizster, a photo-based submission and feedback system. Quizster uses a full stack of JS and runs CouchDB and PouchDB at the data layer.


Offline Camp

Building the Offline First community, one campfire at a time.

Thanks to Teri Chadbourne and Bradley Holt.

Geoff Cox

Written by

Geoff Cox

A coder with a passion for JS, GraphQL, CouchDB, React and Docker

Offline Camp

Building the Offline First community, one campfire at a time.