Custom Indexers for Cloudant

Using JavaScript and Redis for problems that don’t fit Cloudant’s indexing engines

Glynn Bird
Sep 26, 2017 · 6 min read

The data

Let’s say we’re storing documents in Cloudant that represent a page view on our website. Each document represents a single page view:

Custom indexes

We are going to build two custom indexes in Redis that would be tricky or inefficient to achieve in Cloudant with its built-in indexers. In general, they are queries that do not lend themselves well to key range lookups or text searches:

  • Find the number of unique IP addresses used to access our site

Top ten pages

It’s easy to count things in Cloudant. Simply create a MapReduce view with a JavaScript map function:

Distinct counts of IP addresses

Counting the number of unique IP addresses visiting our site is an example of the count-distinct problem. It uses more memory the more unique things you’re counting, and with over 4 billion possible IP addresses, this operation has the potential to get tricky.

Streaming the changes

We can write a simple Node.js script that glues together Cloudant and Redis. Here’s what we want it to do:

  • Update the totals in the Redis database for each change
Basic architecture of a Node.js app connecting Cloudant changes to Redis.

Monthly reporting

An enhancement to this approach is to have a monthly leaderboard and monthly unique IP address counts. We can easily enable this feature by parsing the date string in the Node.js code and writing to Redis keys with the month included (e.g., "leaderboard_2017-07"). On a month boundary, new data will be automatically fed into the next month's key:

Going serverless

So far we’ve created Node.js processes that listen to the Cloudant changes feed. There’s another way: if we create an OpenWhisk action that processes a single change, then we can trigger it from a Cloudant changes feed automatically. This passes the responsibility of the changes-feed-handling to OpenWhisk. We only need to to worry about the data processing.

A similar architecture, but this time with OpenWhisk connecting Cloudant to Redis.
  • https://openwhisk.ng.bluemix.net/api/v1/web/NAMESPACE/leaderboard/getleaderboard.json

Serverless vs. App

Which approach is better? The serverless approach leaves us with less infrastructure to worry about, but there are advantages for the app-based deployment in this case.

  • OpenWhisk’s stateless nature means that each invocation of the action requires a connection and disconnection to both Cloudant and Redis — the app will reuse the connections again and again.
  • With further refinements to the app, we could reduce the writes to Redis by buffering some of them in the app and only writing to Redis periodically (say every 10 seconds). This approach would be impossible to achieve in OpenWhisk.

Custom indexes vs. Built-in indexes

Cloudant already has a number of built-in indexing engines:

Go forth and index

We’ve seen how we can write code to process a Cloudant database’s changes feed, writing running totals and counts to an in-memory store. It’s a use-case for problems that don’t suit Cloudant’s built-in indexing engines. We have also seen how the code can be deployed to the OpenWhisk serverless platform.

IBM CODAIT

Things we made with data at IBM’s Center for Open Source Data and AI Technologies.

Glynn Bird

Written by

Developer @ IBM. https://glynnbird.com

IBM CODAIT

Things we made with data at IBM’s Center for Open Source Data and AI Technologies.