Couchimport Revisited
Adding one-to-many transforms to my CouchDB command-line tool
The couchimport command-line tool is a popular way of importing structured data (CSV/TSV) from a file, spreadsheet, or database into Apache CouchDB™ or IBM Cloudant. We’ve previously covered the tool here on Medium. In this article, I’ll provide a quick refresher on its functionality, and describe a helpful feature contributed by the community.
First, create your destination database and make a note of the URL (e.g. https://user:pass@host.cloudant.com
) and your database name (e.g., mydb
).
Then simply pipe your data file into couchimport
:
As long as your input file’s first line contains the column headings, you should end up with JSON documents in your database — one per line of your input file, except the first line, whose values are used as the attributes of the JSON document.
| name | town | lat | long |
| ------------- |:-------------:| -------:| -------:|
| Bob | London | 51.5072 | -0.1275 |
| Frank | Bolton | 53.5789 | -2.429 |
| Susan | are neat | 51.1295 | 1.3089 |
This tabular data would produce JSON documents like this:
Transforms
The couchimport
tool also allows a transform function to be used to modify the data before it is added to the database. For example, we could create a JavaScript file like this:
And run another import:
Our mytransform
function is called with every object before it is added to the database, allowing us to:
- Add or remove fields
- Strip whitespace
- Coerce data types (in this example, converting strings to numbers)
- Filter out rows altogether
- Turn the source data into a different form (e.g., GeoJSON)
One-to-many
The latest version of couchimport
allows a single row of data to generate multiple documents, if the transform function returns an array of objects.
Let’s say we want to create separate documents for the person
objects and the location
objects. We would create a new transform.js
that returns an array of objects that we want to insert:
And run the import again:
This time, we get two documents for each line of input:
Thanks to [Martynus] for this latest change. 😀