Getting Started With Spiegel: Scalable Replication and Change Listening for CouchDB

Geoff Cox
Geoff Cox
Jan 15, 2018 · 6 min read

In Scalable CouchDB Replication and Change Listening with Spiegel, I summarized how you can use Spiegel to scale your replication and change listening. Be sure to read that introduction before exploring this follow-up article.

In this post we are going to use a basic example to step through how you would actually use Spiegel. This tutorial assumes that you have some familiarity with CouchDB and it will only focus on the replication and change-listening elements of these examples.

Example: A Blogging App

Consider an example where we have users posting blog entries. Let’s assume that we want to use PouchDB to sync data between the client and CouchDB. To streamline this syncing we’ll assume a design of a database per user where each user’s database is named user_<username>. We can then use PouchDB to sync with the user’s DB so that the user can create and edit blog posts via their own DB. Moreover, having a DB per user will allow us to restrict access to the user databases so that only the owner of a post can edit her or his posts.

We’ll also want an all_blog_posts database that stores the blog posts from all the users so that we can access a list of the blog posts by issuing a query against a single database. In this design, we’ll want to replicate all the docs in the user DBs to the all_blog_posts DB. In addition, we’ll assume that all_blog_posts will store some basic stats such as the total number of blog posts.

Example: A Blogging App

Setup

We’re going to use Docker Swarm to run Spiegel as it can be used to automatically scale our processes and swarm nodes. Moreover, Docker will take care of automatically restarting the processes in the event of a permanent error. Alternatively, you could also run Spiegel via Kubernetes or just plain ol’ Node.js.

Create a Custom API

Spiegel does the hard work of scaling the change listening, and on our end we need to create a REST API that will be used to calculate these stats. Spiegel offloads this processing to a REST API so that you can cleanly separate your business logic from Spiegel. You can use any technology, e.g. Node.js, Python, Ruby, etc… to create this API server. Let’s assume that we have a calc-stats endpoint and that we enforce basic authentication with user and secret.

To illustrate this, here is an example using Node.js and KOA:

const Koa = require('koa')
const route = require('koa-route')
const auth = require('koa-basic-auth')
let app = new Koa()app.use(auth({ name: 'user', pass: 'secret' }))app.use(
route.put('/calc-stats', ctx => {
// TODO: code to calculate stats
ctx.body = 'Success'
})
)
app.listen(3000)

Let’s assume that this route can now be accessed via http://user:secret@yourapi.com/calc-stats

Install Docker Swarm

See the official Docker documentation or Installing Docker Swarm on Ubuntu.

Create a Passwords File for Your Change Listeners

Spiegel uses password files to store the passwords used in your API calls so that the passwords are left out of the database docs. Create change-listener-passwords.json and fill in the details for your particular API:

{
"yourapi.com": {
"user": "secret"
}
}

Create a Passwords File for Your Replications

Create replicator-passwords.json and fill in the details for your CouchDB setup. The user must be a CouchDB admin.

{
"yourcouchdb.com": {
"user": "password"
}
}

Install Spiegel

$ docker run -it \
-e TYPE='install' \
-e URL='http://user:password@yourcouchdb.com:5984' \
redgeoff/spiegel

When Spiegel is installed, a new database called spiegel is created and a design doc named sieve is added to _global_changes.

Create the Update Listener Service

The update-listener listens to _global_changes and then schedules on_change rules and replications accordingly. It is a good idea to run two update-listeners so that your update-listeners are redundant. The update-listeners are very lightweight and offload the change listening and replication to the change-listeners and replicators, so there isn’t much benefit in running more than two update-listeners.

$ docker service create \
--name update-listener \
--detach=true \
--replicas 2 \
-e TYPE='update-listener' \
-e URL='http://user:password@yourcouchdb.com:5984' \
redgeoff/spiegel

Create the Change Listener Service

The change-listener runs on_change rules for all matching changes. If you need to listen to more changes or respond to these changes faster, add a change-listener.

$ docker service create \
--name change-listener \
--detach=true \
--replicas 2 \
-e TYPE='change-listener' \
-e URL='http://user:password@yourcouchdb.com:5984' \
--mount type=bind,source=change-listener-passwords.json,destination=/usr/src/app/passwords.json \
-e PASSWORDS_FILE=/usr/src/app/passwords.json \
redgeoff/spiegel

Create the Replicator Service

The replicator process performs the actual replications. If you need to perform more replications or replicate faster, add a replicator.

$ docker service create \
--name replicator \
--detach=true \
--replicas 2 \
-e TYPE='replicator' \
-e URL='http://user:password@yourcouchdb.com:5984' \
--mount type=bind,source=replicator-passwords.json,destination=/usr/src/app/passwords.json \
-e PASSWORDS_FILE=/usr/src/app/passwords.json \
redgeoff/spiegel

Create an on_change Doc

An on_change doc defines a rule that executes a REST API call when the regular expressions match. For example, if we want any change in our all_blog_posts to result in a call to calc-stats, we could use:

{
_id: 'all_blog_posts_on_change',
type: 'on_change', db_name: '^all_blog_posts$',

url: 'http://user@yourapi.com/calc-stats',
params: {
change: '$change'
},
method: 'PUT',
debounce: true
}

Notes:

  1. $change will be replaced with the change doc as retrieved from the CouchDB _changes feed.
  2. The debounce option is used to ignore duplicate API requests. In a more sophisticated setup you may want to use a messaging queue like Kafka to better debounce duplicate requests.
  3. Passwords are automatically injected via the passwords file.

Create a replicator Doc

Spiegel replicator docs are almost identical to the docs you would normally put in CouchDB’s native _replicator database. Spiegel replicator docs, however, are located in the spiegel database. In our example, we’ll want to create one for each of our users so that a particular user’s blog posts are replicated to the all_blog_posts database. For the sake of this tutorial you can use Fauxton at http://yourcouchdb.com:5984/_utils to create these docs in the spiegel database. For example, let’s assume that you have a user with a username of 09e05a1a-3de7–4ba3-a503–868064d84309 and a corresponding user_09e05a1a-3de7–4ba3-a503–868064d84309 database.

{
type: 'replicator',
source: 'http://user@yourcouchdb.com:5984/user_09e05a1a-3de7–4ba3-a503–868064d84309', target: 'http://user@yourcouchdb.com:5984/all_blog_posts'
}

Notes:

  1. As desired, the above replicator will execute a replication only when there is a change to the user_09e05a1a-3de7–4ba3-a503–868064d84309 database.
  2. Passwords are automatically injected via the passwords file.

Automatically Creating replicator Docs

In a real-world system, you’ll want your software to create these replicator docs (and probably a user DB) whenever a user is registered. You can use CouchDB’s couchperuser feature to automatically create a DB for the new user, or for a more flexible option, you can use another on_change doc. For example, the following on_change rule will be run whenever a user is created or edited:

{
_id: 'user_created_on_change',
type: 'on_change', db_name: '^_user$',

url: 'http://user@yourapi.com/after-user-created',
params: {
change: '$change'
},
method: 'PUT',
block: true
}

Note: You’ll need to define a new API route for after-user-created that will create the user DB and corresponding replicator if the user doesn’t already exist.

Troubleshooting

Something not working? Try looking at the logs:

$ sudo npm install -g bunyan
$ docker ps # to view list of containers
$ docker logs -f <container> | bunyan
  1. To keep things simple and avoid problems with self-signed certs, etc… the preceding examples do not use secure connections. In a production environment you will definitely want to use HTTPS.
  2. For extra security, use Docker Secrets to encrypt the URL parameter.

Let’s Wrap This Up

Scaling replication and change listening for CouchDB can be a bit tricky, but Spiegel makes it a lot easier.

At some point in the future, I hope to write a series of posts that will take you through the process of creating an entire app with CouchDB and PouchDB. Stay tuned!

About the Author

Geoff Cox is the creator of MSON, a new declarative programming language that will allow anyone to develop software visually. He loves taking on ambitious, yet wife-maddening, projects like creating a database and distributed data syncing system. You can read more of his posts at redgeoff.com or reach him @CoxGeoffrey or at github.

Offline Camp

Building the Offline First community, one campfire at a…