Basic Indexing in MongoDB

One of the easiest things we can do to speed up performance in MongoDB is to add the necessary indexes to support our queries. Unfortunately indexing is a technique that is often misunderstood.

MongoDB, like a lot of other databases, allows you to create secondary indexes on your collections. In this post we will cover the most basic form of index, the single field index. MongoDB has other types of indexes that you can create as well, such as compound (indexes containing two or more fields), full-text (indexes over strings of text), and geo-spatial (indexes over 2D coordinates, or points on a spherical map). We will talk about these types of indexes in later posts.

What are indexes?

An index in a database is similar to an index at the back of a book. Just like an index in a book helps you to quickly find out where a topic is located in a book, and index in a database provides a way for the database server to efficiently and effectively look up data.

MongoDB stores data for an index in a b-tree structure. B-trees are efficient since you don’t have to look at all the entries to find the value you’re looking for. Without an index, MongoDB would have to look at each and every document in the collection to find the ones that contain the data you’re looking for. This might not be a big deal if your collection has a small number of documents, but the time to search through these documents grows linearly as new documents are added. This means that when you first start putting documents in your collection that is not indexed, the time to query it will be fast, but as you keep adding more and more documents your query time will be slower for each document added.

Indexes allow you to speed up your reads by keeping a smaller subset of the data in the index structure which allows you to read through the data quicker. This also allows you to keep more indexed data in memory as you have less information to store.

Indexes can either be used to find a single value for a given field, a range of values for the field, a variety of different values in a given field, or they can be used return the field in a sorted order. Once the required index entries have been found, MongoDB will follow a pointer back to the location of the document to get the rest of the data it needs to fulfill the query’s request. There are cases when the database server doesn’t go back to the document, but that is only when the index contains all the fields necessary (based off the query match, projection and sort fields). This is called a covering index and we will talk about those in a later post.

Even though indexes are beneficial, when properly used, we do have to take care to not over index our collection as every index we create has an impact upon write operations to the data. This is due to the fact that for every write MongoDB has to update not only the data, but the indexes as well. This means that over-indexing a collection can be as bad as under-indexing it. Over-indexing a collection affects writes, while under-indexing affects reads, and both scenarios can bring an otherwise well running system to its knees.

One other consideration you need to take into account is the field’s selectivity. Index selectivity is a number showing how effective your index is. You can determine this value, but taking taking the total number of documents and dividing it by the number if distinct values in the indexed field(s). The higher this number the more selective your index is.

Now that we’ve discussed some of the theory, let’s take a look at the most basic of indexes, the single field index.

Single field indexes

A single field index exactly what it sounds like. An index that contains just a single field. When you create an index on a single field, the order that values are stored, either ascending or descending, does not matter since MongoDB can read through the index just as effectively in the reverse order as it can in a forward order.

This type if index works exactly like the index at the back of a book. The index has an alphabetical list of words that will reference a page (or pages) that the word is found on.

Let’s say we created an index in collection collName on number in an ascending order with the following command:

db.collName.createIndex({"number": 1})

Note that MongoDB uses 1 for ascending, and -1 for descending order. This is true for both creating indexes and when used for sorting.

As stated earlier, this index can be used for helping return the results of the following types of query:

  • Exact match on a field
db.collName.find({"number": 42})

This will return all documents in collection collName where number has a value of 42.

  • A range of values
db.collName.find({"number": {"$gte": 42}})

This will return all documents in collection collName where number has a value that is greater than or equal to 42.

  • A variety of values
db.collName.find({"number": {"$in": [14, 28, 42, 66]}})

This will return all documents in collection collName where number has a value that is equal to 14, 28, 42 or 66. This is similar to the range of values, but it allows you to pick values that are not contiguous.

  • Sorting
db.collName.find({}).sort({"number": 1})

This will return all documents in collection collName sorted in ascending order on number. Doing this will use the index so MongoDB doesn’t have to sort the document in memory. The great thing here is that MongoDB can use the index for sorting either in the direction the values are stored, or in reverse direction. If the above sort was to be changed to descending order (.sort({“number”: -1}), MongoDB would just start at the last item in the index and work its way to the first number without issue.

The above index could also be used both for matching and sorting.

db.collName.find({"number": {"$gte": 42}}).sort({"number": 1})

Of course this only makes sense to do when you’re doing a range or a variety of values query. An exact match and sort on the same field doesn’t really buy you anything extra as far as the results being returned.

Let’s say we had a collection that tracked different ice cream flavors. One of our documents might look like this:

"flavor": "Mint Chocolate Chip",
"brand": "Baskin Robbins",
"calories": 160,
"TotalFatGrams": 10,
"ServingSize": "1 2.5 ounce scoop"

Now let’s say you wanted to create a single field index on the flavor field, with the values stored in ascending order. You can do this in the mongo shell (the MongoDB drivers will have similar formats) using the following command:

db.icecream.createIndex({"flavor": 1})

In the above example we’ve created an index on the collection named icecream that contains the field flavor. We’ve created this index in an ascending value on the keys. Note that MongoDB use 1 for ascending, and -1 for descending order.

You can also index fields in nested objects or arrays. If you had a collection of store locations with documents that look similar to the following:

"storeId": 2146,
"location": {
"city": "Denver",
"state": "Colorado",
"mainPhone": "720-824-4821"
"departments": ["bakery", "pharmacy", "produce"]

You could create an index on the locations subdocument state field with the following command:

db.stores.createIndex({"location.state": 1})

You can also create an index on the location subdocument itself, but using an index when querying this subdocument gets to be tricky since you have to match all fields in the correct order. This means your command has to be as follows:

db.stores.find({"location": {"city": "Denver", "state": "Colorado", "mainPhone": "720-824-4821"}})

If you put the subdocuments fields in a different order, you will not return a match on the sample record. This is confusing to most people, and I would recommend against indexing the whole subdocument, and would instead advise you to index the different subdocument fields that you might be interested in.

To index an array, you would use the following command:

db.stores.createIndex({"departments": 1})

When you create an index on an array, this is called a multikey index, you will get one entry for every value in the array. In the above example, we would have three index entries all pointing back to the same document. In this case, where we’re creating a single field index, indexing an array field is OK, as long as you remember that your index will have the total number of array items that are present in each document. When creating compound indexes, only one field from each document can be an array value, although the array doesn’t have to be the same field in each document. This is to stop the unbounded proliferation of index entries. If you had two fields in a document that both contained an array with 10 values, you would have 100 index entries for that single document.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.