Importing data to CouchDB with CouchImport

Life beyond the spreadsheet for CSVs

Lorna Mitchell
Jul 20, 2017 · 3 min read

Data often comes in simple formats. The CSV format (Comma-Separated Values) is a very common one, especially for relatively small data sets, and even for large data sets having millions of rows. It’s an ideal format in many ways, easily understood by machines, and you can even open it with a standard desktop spreadsheet tool such as Excel. That said, sometimes I want to do more interesting things than I can with Excel, and for that, I’ll want a proper database!

I’ve been working with some of the open data sets that my team maintains. I’m a frequent and enthusiastic user of CouchDB and its IBM-hosted version Cloudant. These databases are often where I like to land data. Handily, my colleague Glynn Bird has already created just the tool I need: CouchImport, a command-line tool for importing CSV data to CouchDB.

Put your CSV or TSV data to work in a CouchDB-style database using CouchImport.

The couchimport tool is available from npm. Since I will run it as a command in my terminal, I'll install it globally:

npm install -g couchimport

Test that the tool is working by running the following command. It should output a version number:

couchimport --version

With CouchImport in place, you’ll need to get data and configure the tool to know what to do with it. For this example, I’m going to use the movies database that you can find here: https://ibm-watson-data-lab.github.io/open-data/#movies

I’d like this data imported into my local CouchDB instance and stored there in a database very creatively named movies. So first, create that local database:

curl -X PUT http://localhost:5984/movies

You can also use the web dashboard to create a new database in CouchDB/Cloudant, but either way make sure the target database exists and that you know the following:

  • the URL of the CouchDB server, including port number and security credentials if required (for my locally-running CouchDB: http://localhost:5984)
  • the name of the database to import into (I just created movies)
  • the character that separates your data; the movies database is actually TSV (Tab-Separated Values) so I’ll need \t which is the default value

We can supply these arguments—and many others, check out the project page—either as environment variables, or as command-line switches. I’ll use the command-line approach since I only want to run this command once. To pass in my movies.tsv file, I simply use cat and then direct the output into my couchimport command. If that sounds complicated, here's an example to clarify:

cat movies.tsv | couchimport --url http://localhost:5984 --db movies

The delimiter is the default \t or "tab" character. If you're using a CSV file, then it should be set to , (by appending the --delimiter ‘,’ flag to the command above) so that the utility can separate the data appropriately. Check your database, and you should see that your data has imported correctly:

My movie data has made the leap from TSV row to JSON document (pictured here in the updated Fauxton dashboard in CouchDB 2.0).

You can now go ahead and use CouchDB to explore your data set. For more information about the power of CouchDB, try the views documentation.

To make the exact opposite manoeuvre—in other words, take flat-shaped data from your CouchDB database and turn it into a CSV—the same module also includes a couchexport command. Enjoy!

If you enjoyed this article, please ♡ it to recommend it to other Medium readers.

IBM CODAIT

Things we made with data at IBM’s Center for Open Source Data and AI Technologies.

)

Thanks to Mike Broberg, Teri Chadbourne, and Raj Singh

Lorna Mitchell

Written by

Developer Advocate at IBM. Technology addict, open source fanatic and incurable blogger (see http://lornajane.net)

IBM CODAIT

Things we made with data at IBM’s Center for Open Source Data and AI Technologies.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade