Cloudant Fundamentals: Indexing

Primary or secondary? Type json or text? Understanding your options (part 9 of 10)

Glynn Bird
Jul 12, 2018 · 3 min read

In part 7 of this series, we saw a warning in the search results:

“no matching index found, create an index to optimize query time”

This is Cloudant’s polite way of saying that your query is expensive and the database is having to walk through the whole data set to calculate the answer. In small databases this is not a problem, but in a production system, with the data growing all the time, an index is essential.

Photo by NeONBRAND on Unsplash

What is an index?

A database index is just like an index in a book or a biblical concordance. It is a sorted data structure that allows quick access to a portion of the data.

Our data looks like this:

Cloudant automatically creates an index on the field so that it can retrieve data by . If we are going to be making lots of queries on the field, then it might make sense to instruct Cloudant to create a secondary index on that field.

This is achieved by writing to the POST /db/_index endpoint:

In the body of the JSON supplied to the endpoint, we specify the array of fields that are to be indexed. Cloudant then creates the index on disk (ordered by in this case), so that it can be used by future queries.

If we repeat our query, we should see the same results, but without the warning:

Our query is now using the index and you should see a performance improvement, especially with larger data sets.

Index types — json & text

Cloudant Query has two types of index — and .

Indexes of are built using Cloudant's MapReduce technology. This creates single-use indexes in the order of the keys you supply. For example, if you index and then the index will be in / order and useful for queries that involve selectors on those fields in that order.

Indexes of are powered by Cloudant's Search technology. This builds separate "concordances" for the fields you supply, making it more universal for the database at query time. (See the $text operator for matching arbitrary strings against blocks of text in your documents.)

Both index types have their subtleties, so make sure you’ve read the documentation and understand how it works before going into production!

Indexing strategies

The job of an index is to help the database to reduce the volume of data it’s working with to a manageable size. Imagine you have a query like this for a database of movies:

Which field or fields should you create the index on to best serve this query? It depends on whether which is the smaller:

  • the number of movies in that database for a given year
  • the number of movies that star an actor

In this case, I’d be inclined to create an index on because there may be thousands of movies in a year, but an actor might be credited with only a few dozen movies in their entire career. So an index on winnows the database to a smaller data set than an index on .

Our index on reduces a database of tens of thousands of records down to a few dozen. It is then simple for the database to apply the second clause of the to this smaller data set.

Other tips:

  • Use the POST /db/_explain to see how Cloudant picks an index for a specified query.
  • Cloudant allows you to specify the index you want the database to use at query time. This is good practice as you remove any ambiguity in index selection.
  • Consider creating partial Cloudant indicies if you are only ever interested in querying a subset of your database (i.e., you could eliminate preliminary blog posts from your index and only include published posts in a partial index).
  • There’s always Cloudant’s MapReduce, Search and Geospatial indexes for a lower-level querying experience.
  • Not all queries are “Cloudant-shaped”. If you are building leader boards or counting distinct values, you might want to consider rolling your own custom index outside of Cloudant.

Next time

In the final part of this series, we’ll look at performing aggregations using Cloudant.

IBM CODAIT

Things we made with data at IBM’s Center for Open Source Data and AI Technologies.

Glynn Bird

Written by

Developer @ IBM. https://glynnbird.com

IBM CODAIT

Things we made with data at IBM’s Center for Open Source Data and AI Technologies.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade