Importing data to CouchDB with CouchImport

Life beyond the spreadsheet for CSVs

--

Data often comes in simple formats. The CSV format (Comma-Separated Values) is a very common one, especially for relatively small data sets, and even for large data sets having millions of rows. It’s an ideal format in many ways, easily understood by machines, and you can even open it with a standard desktop spreadsheet tool such as Excel. That said, sometimes I want to do more interesting things than I can with Excel, and for that, I’ll want a proper database!

I’ve been working with some of the open data sets that my team maintains. I’m a frequent and enthusiastic user of CouchDB and its IBM-hosted version Cloudant. These databases are often where I like to land data. Handily, my colleague Glynn Bird has already created just the tool I need: CouchImport, a command-line tool for importing CSV data to CouchDB.

Put your CSV or TSV data to work in a CouchDB-style database using CouchImport.

Performing the import

The couchimport tool is available from npm. Since I will run it as a command in my terminal, I'll install it globally:

Test that the tool is working by running the following command. It should output a version number:

With CouchImport in place, you’ll need to get data and configure the tool to know what to do with it. For this example, I’m going to use the movies database that you can find here: https://ibm-watson-data-lab.github.io/open-data/#movies

I’d like this data imported into my local CouchDB instance and stored there in a database very creatively named movies. So first, create that local database:

You can also use the web dashboard to create a new database in CouchDB/Cloudant, but either way make sure the target database exists and that you know the following:

  • the URL of the CouchDB server, including port number and security credentials if required (for my locally-running CouchDB: http://localhost:5984)
  • the name of the database to import into (I just created movies)
  • the character that separates your data; the movies database is actually TSV (Tab-Separated Values) so I’ll need \t which is the default value

We can supply these arguments—and many others, check out the project page—either as environment variables, or as command-line switches. I’ll use the command-line approach since I only want to run this command once. To pass in my movies.tsv file, I simply use cat and then direct the output into my couchimport command. If that sounds complicated, here's an example to clarify:

The delimiter is the default \t or "tab" character. If you're using a CSV file, then it should be set to , (by appending the --delimiter ‘,’ flag to the command above) so that the utility can separate the data appropriately. Check your database, and you should see that your data has imported correctly:

My movie data has made the leap from TSV row to JSON document (pictured here in the updated Fauxton dashboard in CouchDB 2.0).

You can now go ahead and use CouchDB to explore your data set. For more information about the power of CouchDB, try the views documentation.

A two-way street

To make the exact opposite manoeuvre—in other words, take flat-shaped data from your CouchDB database and turn it into a CSV—the same module also includes a couchexport command. Enjoy!

If you enjoyed this article, please ♡ it to recommend it to other Medium readers.

--

--

Lorna Mitchell
Center for Open Source Data and AI Technologies

Polyglot programmer, technology addict, open source fanatic and incurable blogger (see http://lornajane.net)