diff command-line utility has been around since the 1970s. It compares two text files line-by-line and tells you the differences between them.
$ diff a.txt b.txt
< T.S. Eliot 1888-1965
> T.S. Eliot
> Sprouting despondently at area gates.
< From Prufrock, and other observations (The Egoist, Ltd, 1917)
Lines starting with
< mean “this line needs removing,” and lines starting with
> mean “this line needs adding.” The other lines specify where in the file the changes would need to be made. This machine-readable data allows one file to be “patched” to match another.
We can also output the same data in a so-called “unified” format by passing a
$ diff -u a.txt b.txt
--- a.txt 2017-04-25 14:03:12.000000000 +0100
+++ b.txt 2017-04-25 13:56:20.000000000 +0100
@@ -1,14 +1,13 @@
Morning at the Window
-T.S. Eliot 1888-1965
They are rattling breakfast plates in basement kitchens,
And along the trampled edges of the street
I am aware of the damp souls of housemaids
+Sprouting despondently at area gates.
The brown waves of fog toss up to me
Twisted faces from the bottom of the street,
And tear from a passer-by with muddy skirts
An aimless smile that hovers in the air
And vanishes along the level of the roofs.
-From Prufrock, and other observations (The Egoist, Ltd, 1917)
This format uses
+ in place of
> and provides a few lines of context around each change, making it a little easier to digest.
Diffing a database
Let’s say we have two databases instead of two text files — in this case two Apache CouchDB™ or Cloudant databases. How can we tell if documents in each database are identical, and if they’re not, which ones differ?
I’ve written a command-line tool to do just that: couchdiff. It is installed using the
npm install -g couchdiff
You can then use
diff, except that it expects two URLs instead of two file paths. For example:
$ couchdiff http://localhost:5984/mydb1 http://localhost:5984/mydb2
In this case, the two databases are identical except for document id
1000543, which is at a later revision in the second database.
The URLs can point to local CouchDB databases or to remote Cloudant databases, or both:
couchdiff also accepts a
-u parameter to output the data in unified format.
How does couchdiff work?
Here is the basic order of operations of the
- It gets the changes feed for each of the databases and writes the document id and revision token to a temporary file — one file for each database.
- The temporary files are sorted using the
sortcommand-line tool. This ensures that both files are in “id order.”
- The two files are diffed using the
diffutility, which is the output you see. If the databases are identical, there will be no output.
You need both
diff to be installed on your machine for this to work. A Mac and most Linux distributions would have them pre-installed.
What about conflicts?
Conflicted documents are ignored by
couchdiff by default, but adding the
--conflicts parameter brings them into play. Here’s how that looks:
Both databases will be compared, including any variance in conflicted revisions.
What about attachments?
couchdiff tool doesn't examine the bodies of binary attachments explicitly, but since the document bodies contain a digest of each attachment, it will be able to detect differences in attachments.
Other command-line tools
If you need to access your CouchDB or Cloudant database from the command-line, then there are other tools you can use:
- couchimport — import data to your JSON document store from CSV/TSV files and vice versa
- couchshell — interact with your databases as if they were a file system
- couchbackup — backup your database to a text file and restore just as easily
I hope these tools are useful. If you have any feedback — especially on
couchdiff — please let me know in the comments below, or create a GitHub issue. Pull requests, as always, are quite welcome.
As always, if you enjoyed this article, please ♡ it to recommend it to other Medium readers. Thanks for reading!