Importing data to CouchDB with CouchImport
Life beyond the spreadsheet for CSVs
Data often comes in simple formats. The CSV format (Comma-Separated Values) is a very common one, especially for relatively small data sets, and even for large data sets having millions of rows. It’s an ideal format in many ways, easily understood by machines, and you can even open it with a standard desktop spreadsheet tool such as Excel. That said, sometimes I want to do more interesting things than I can with Excel, and for that, I’ll want a proper database!
I’ve been working with some of the open data sets that my team maintains. I’m a frequent and enthusiastic user of CouchDB and its IBM-hosted version Cloudant. These databases are often where I like to land data. Handily, my colleague Glynn Bird has already created just the tool I need: CouchImport, a command-line tool for importing CSV data to CouchDB.
Performing the import
The couchimport
tool is available from npm. Since I will run it as a command in my terminal, I'll install it globally:
npm install -g couchimport
Test that the tool is working by running the following command. It should output a version number:
couchimport --version
With CouchImport in place, you’ll need to get data and configure the tool to know what to do with it. For this example, I’m going to use the movies database that you can find here: https://ibm-watson-data-lab.github.io/open-data/#movies
I’d like this data imported into my local CouchDB instance and stored there in a database very creatively named movies
. So first, create that local database:
curl -X PUT http://localhost:5984/movies
You can also use the web dashboard to create a new database in CouchDB/Cloudant, but either way make sure the target database exists and that you know the following:
- the URL of the CouchDB server, including port number and security credentials if required (for my locally-running CouchDB:
http://localhost:5984
) - the name of the database to import into (I just created
movies
) - the character that separates your data; the movies database is actually TSV (Tab-Separated Values) so I’ll need
\t
which is the default value
We can supply these arguments—and many others, check out the project page—either as environment variables, or as command-line switches. I’ll use the command-line approach since I only want to run this command once. To pass in my movies.tsv
file, I simply use cat
and then direct the output into my couchimport
command. If that sounds complicated, here's an example to clarify:
cat movies.tsv | couchimport --url http://localhost:5984 --db movies
The delimiter is the default \t
or "tab" character. If you're using a CSV file, then it should be set to ,
(by appending the --delimiter ‘,’
flag to the command above) so that the utility can separate the data appropriately. Check your database, and you should see that your data has imported correctly:
You can now go ahead and use CouchDB to explore your data set. For more information about the power of CouchDB, try the views documentation.
A two-way street
To make the exact opposite manoeuvre—in other words, take flat-shaped data from your CouchDB database and turn it into a CSV—the same module also includes a couchexport
command. Enjoy!
If you enjoyed this article, please ♡ it to recommend it to other Medium readers.