Generating sample data for a JSON data store

An empty database isn’t much fun. The datamaker tool can help to generate sample data in many forms from the command-line.

Application development using Cloudant/CouchDB as the database, for me, starts with data design. Having carefully considered how your application’s data should be modelled in JSON we may turn to the querying and indexing required:

  • How do my queries perform with 10k, 1m or 10m documents?
  • How long does it take for a new batch of data to be indexed?
  • Is it better to use a MapReduce or Cloudant Search index to solve a particular problem?

Oftentimes, app development starts with a blank database. It’s helpful at this point to put the theory to the test with a meaningful amount of data — to a/b test two indexes, benchmark queries and measure indexing and throughput performance.

To do this we need a source of data. As our application isn’t live yet, we don’t have any real data.

This is where the datamaker tool comes in.

What is datamaker?

datamaker is a command-line tool that can generate random data. Not just random numbers, but company names, addresses, emails, dates etc.

It’s a free, open-source tool published on npm (Node.js & npm are required). To install it, simply run :

install datamaker using npm

Give it a spin by piping in a template string. Placeholders for random data are signified by named tags encased in double curly braces:

datamaker replaces placeholder tags in curly braces

If you need more data, the --iterations/-i flag is used to specify the number of data points:

Use -i to specify the number of iterations

We can use datamaker to form CSV or XML data, but for a Cloudant database we need JSON. The best way to do this is to create a template containing one of your documents, with placeholder tags marking where the data should go:

Make a JSON template with placeholder tags
Notice how some of the datamaker tags can take parameters: {{float 1 10 1}} means "generate a floating point number between 1 and 10, with 1 decimal place.

We can then pass the path of the file to datamaker with the --template/-t option and specify "json" with the --format/-f flag:

one JSON document per line of output

The datamaker project has tens of supported tags — see the project’s documentation for details. Airport codes, URLs, email addresses, prices, currencies etc.

Importing data into a Cloudant/CouchDB database

The tool to import JSON data into Cloudant already exists: it’s couchimport which supports the jsonl format (one JSON document per line) out of the box. Simply pipe the output of datamaker into couchimport:

Pipe datamaker’s output into couchimport to write data to Cloudant/CouchDB

The output of the datamaker is written to Cloudant in a series of bulk HTTP API calls. Simple as that!

References