Time-sortable document ids in Cloudant

Making a document id that his time-sortable and unique

A Cloudant database document’s _id field has to be unique. When you create a document and leave the _id field blank, the database will create one for you:

Create document — database-generated id.

Cloudant’s generated _id fields are 32 characters long and made entirely of numerals and lowercase letters. They are unique, or at least have a negligible probability of clashing, by virtue of being a long pseudo-random string of characters.

Alternatively, you may supply your own _id field which is useful when your app knows something unique about your domain's data:

Create document — user-supplied id.

In this case I’m using the _id to store both the type of the document ("user") and something unique about each user I'm storing ("glynn") in the same portmanteau _id.

Making a sortable _id field

In some applications it would be useful for the _id field to sort into date/time order. The _id is used to create the database's primary index which is used to fetch documents by their id (GET /db/id) and when selecting ranges of documents (GET /db/_all_docs). If a database's _id fields sorted into time order, I could extract data by time without having to create a secondary index e.g. I could fetch the 100 most recently added documents to a database by simply querying the primary index:

Fetch the 100 newest documents.

All I need is an _id scheme that can generate ids that are both unique in the database and yet sort into date/time order.

One solution is published in the kuuid Node.js library I wrote for just this purpose. Simply use kuuid.id() to generate your _id values:

Write a document with a user-supplied _id that is time-sortable and unique.

which creates a document that looks like this:

The resultant document.

How do sortable ids work?

A kuuid-generated id consists of 32 characters made up of numbers and upper case & lower case letters. It is split into two sections:

the _id contains two sections…
  1. the first eight characters contain the date/time, stored as the number of seconds since the 1st of January 1970.
  2. the remaining twenty four characters are 128 bits of random data.

Both pieces of information are encoded in “base 62”, allowing more information to be packed into the same number of characters by using a case-sensitive character set.

Two ids generated in the same second will have the same first eight characters, but the chances of the remaining 24 characters clashing are vanishingly remote.

The documents will be sorted in the database’s primary index in rough date order, that is with a precision of one second.

By judicious use of the GET /db/_all_docs and use of the startkey/endkey/descending parameters, the database's primary index can be queried to provide ranges of documents by type in approximate time order. The kuuid library provides a prefix function that calculates the 8-digit for string that correseponds to a user-supplied date or timestamp:

Example queries on the database’s primary index.

This form of querying is a little convoluted, but if your ids are going to be 32-character random strings, it seems useful to make them loosely time-ordered just to be able to quickly establish the documents that were added recently, if nothing else.

Combining a kuuid and a document type

Optionally, we could still keep the convention of storing the document type in the _id field too:

Using a kuuid with the document type prefix.

Our _id field is now sorted by document type AND time!

The source code for this id generator is not complicated and can be easily reproduced in other programming languages. The Node.js implementation is published on npm.