Diff your databases with couchdiff

Command-Line tool for comparing two Apache CouchDB or Cloudant databases

The Unix diff command-line utility has been around since the 1970s. It compares two text files line-by-line and tells you the differences between them.

Hello, diff!

If I diff two files a.txt and b.txt containing versions of the same poem, I can find the differences between them with the command:

$ diff a.txt b.txt
2c2
< T.S. Eliot 1888-1965
---
> T.S. Eliot
6a7
> Sprouting despondently at area gates.
13,14d13
<
< From Prufrock, and other observations (The Egoist, Ltd, 1917)

Lines starting with < mean “this line needs removing,” and lines starting with > mean “this line needs adding.” The other lines specify where in the file the changes would need to be made. This machine-readable data allows one file to be “patched” to match another.

We can also output the same data in a so-called “unified” format by passing a -u parameter:

$ diff -u a.txt b.txt
--- a.txt 2017-04-25 14:03:12.000000000 +0100
+++ b.txt 2017-04-25 13:56:20.000000000 +0100
@@ -1,14 +1,13 @@
Morning at the Window
-T.S. Eliot 1888-1965
+T.S. Eliot

They are rattling breakfast plates in basement kitchens,
And along the trampled edges of the street
I am aware of the damp souls of housemaids
+Sprouting despondently at area gates.

The brown waves of fog toss up to me
Twisted faces from the bottom of the street,
And tear from a passer-by with muddy skirts
An aimless smile that hovers in the air
And vanishes along the level of the roofs.
-
-From Prufrock, and other observations (The Egoist, Ltd, 1917)

This format uses - and + in place of < and > and provides a few lines of context around each change, making it a little easier to digest.

Diffing a database

Let’s say we have two databases instead of two text files — in this case two Apache CouchDB™ or Cloudant databases. How can we tell if documents in each database are identical, and if they’re not, which ones differ?

I’ve written a command-line tool to do just that: couchdiff. It is installed using the npm command:

npm install -g couchdiff

You can then use couchdiff like diff, except that it expects two URLs instead of two file paths. For example:

$ couchdiff http://localhost:5984/mydb1 http://localhost:5984/mydb2
spooling changes...
sorting...
calculating difference...
2c2
< 1000543/1-3256046064953e2f0fdb376211fe78ab
---
> 1000543/2-7d93e4800a6479d8045d192577cff4f7

In this case, the two databases are identical except for document id 1000543, which is at a later revision in the second database.

The URLs can point to local CouchDB databases or to remote Cloudant databases, or both:

Like diff, couchdiff also accepts a -u parameter to output the data in unified format.

How does couchdiff work?

Here is the basic order of operations of the couchdiff utility:

  1. It gets the changes feed for each of the databases and writes the document id and revision token to a temporary file — one file for each database.
  2. The temporary files are sorted using the sort command-line tool. This ensures that both files are in “id order.”
  3. The two files are diffed using the diff utility, which is the output you see. If the databases are identical, there will be no output.

You need both sort and diff to be installed on your machine for this to work. A Mac and most Linux distributions would have them pre-installed.

What about conflicts?

Conflicted documents are ignored by couchdiff by default, but adding the --conflicts parameter brings them into play. Here’s how that looks:

Both databases will be compared, including any variance in conflicted revisions.

What about attachments?

The couchdiff tool doesn't examine the bodies of binary attachments explicitly, but since the document bodies contain a digest of each attachment, it will be able to detect differences in attachments.

Other command-line tools

If you need to access your CouchDB or Cloudant database from the command-line, then there are other tools you can use:

  • couchimport — import data to your JSON document store from CSV/TSV files and vice versa
  • couchshell — interact with your databases as if they were a file system
  • couchbackup — backup your database to a text file and restore just as easily

I hope these tools are useful. If you have any feedback — especially on couchdiff — please let me know in the comments below, or create a GitHub issue. Pull requests, as always, are quite welcome.

As always, if you enjoyed this article, please ♡ it to recommend it to other Medium readers. Thanks for reading!