Time-sortable document ids in Cloudant
Making a document id that his time-sortable and unique
A Cloudant database document’s
_id field has to be unique. When you create a document and leave the
_id field blank, the database will create one for you:
_id fields are 32 characters long and made entirely of numerals and lowercase letters. They are unique, or at least have a negligible probability of clashing, by virtue of being a long pseudo-random string of characters.
Alternatively, you may supply your own
_id field which is useful when your app knows something unique about your domain's data:
In this case I’m using the
_id to store both the type of the document ("user") and something unique about each user I'm storing ("glynn") in the same portmanteau
Making a sortable _id field
In some applications it would be useful for the
_id field to sort into date/time order. The
_id is used to create the database's primary index which is used to fetch documents by their id (
GET /db/id) and when selecting ranges of documents (
GET /db/_all_docs). If a database's
_id fields sorted into time order, I could extract data by time without having to create a secondary index e.g. I could fetch the 100 most recently added documents to a database by simply querying the primary index:
All I need is an
_id scheme that can generate ids that are both unique in the database and yet sort into date/time order.
One solution is published in the kuuid Node.js library I wrote for just this purpose. Simply use
kuuid.id() to generate your
which creates a document that looks like this:
How do sortable ids work?
kuuid-generated id consists of 32 characters made up of numbers and upper case & lower case letters. It is split into two sections:
- the first eight characters contain the date/time, stored as the number of seconds since the 1st of January 1970.
- the remaining twenty four characters are 128 bits of random data.
Both pieces of information are encoded in “base 62”, allowing more information to be packed into the same number of characters by using a case-sensitive character set.
Two ids generated in the same second will have the same first eight characters, but the chances of the remaining 24 characters clashing are vanishingly remote.
The documents will be sorted in the database’s primary index in rough date order, that is with a precision of one second.
By judicious use of the GET /db/_all_docs and use of the
descending parameters, the database's primary index can be queried to provide ranges of documents by type in approximate time order. The
kuuid library provides a
prefix function that calculates the 8-digit for string that correseponds to a user-supplied date or timestamp:
This form of querying is a little convoluted, but if your ids are going to be 32-character random strings, it seems useful to make them loosely time-ordered just to be able to quickly establish the documents that were added recently, if nothing else.
Combining a kuuid and a document type
Optionally, we could still keep the convention of storing the document type in the
_id field too:
_id field is now sorted by document type AND time!