Generating sample data for a JSON data store

Application development using Cloudant/CouchDB as the database, for me, starts with data design. Having carefully considered how your application’s data should be modelled in JSON we may turn to the querying and indexing required:

  • How do my queries perform with 10k, 1m or 10m documents?
  • How long does it take for a new batch of data to be indexed?
  • Is it better to use a MapReduce or Cloudant Search index to solve a particular problem?

Oftentimes, app development starts with a blank database. It’s helpful at this point to put the theory to the test with a meaningful amount of data — to a/b test two indexes, benchmark queries and measure indexing and throughput performance.

To do this we need a source of data. As our application isn’t live yet, we don’t have any real data.

This is where the datamaker tool comes in.

Photo by Kristian Strand on Unsplash

What is datamaker?

datamaker is a command-line tool that can generate random data. Not just random numbers, but company names, addresses, emails, dates etc.

It’s a free, open-source tool published on npm (Node.js & npm are required). To install it, simply run :

install datamaker using npm

Give it a spin by piping in a template string. Placeholders for random data are signified by named tags encased in double curly braces:

datamaker replaces placeholder tags in curly braces

If you need more data, the --iterations/-i flag is used to specify the number of data points:

Use -i to specify the number of iterations

We can use datamaker to form CSV or XML data, but for a Cloudant database we need JSON. The best way to do this is to create a template containing one of your documents, with placeholder tags marking where the data should go:

Make a JSON template with placeholder tags

Notice how some of the datamaker tags can take parameters: {{float 1 10 1}} means "generate a floating point number between 1 and 10, with 1 decimal place.

We can then pass the path of the file to datamaker with the --template/-t option and specify "json" with the --format/-f flag:

one JSON document per line of output

The datamaker project has tens of supported tags — see the project’s documentation for details. Airport codes, URLs, email addresses, prices, currencies etc.

Importing data into a Cloudant/CouchDB database

The tool to import JSON data into Cloudant already exists: it’s couchimport which supports the jsonl format (one JSON document per line) out of the box. Simply pipe the output of datamaker into couchimport:

Pipe datamaker’s output into couchimport to write data to Cloudant/CouchDB

The output of the datamaker is written to Cloudant in a series of bulk HTTP API calls. Simple as that!





Developer @ IBM.

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Glynn Bird

Glynn Bird

Developer @ IBM.

More from Medium


How I became Disabled

What Does Nursing Really Mean?