Introduction to MongoDB Indexes

Dvir Arad
The Startup
Published in
5 min readMay 26, 2019

What are the MongoDB indexes?
How do they work?

What problem do indexes try to solve?
Answer: Slow queries

We can look at MongoDB collection’s indexes as book indexes.

Collection Scan
If you don’t use an index when we query our collection, the database will have to look at every document. Our collection grows in size. Therefore we will have to search through more and more documents to satisfy our query.
* Order of N operation
* Linear running time

Index Scan
Rather than searching through every document, we can search through the ordered index first. Key-Value pair, where the key is the field's value that we’ve indexed on, and the key's value is the actual document itself.
* _id is automatically indexed

It is possible to have many indexes on the same collection. You might create multiple indexes in different fields if you find other queries for various fields.

MongoDB uses a data structure called a b-tree to store its indexes.

The excellent query performance gain that we get with indexes doesn’t come for free. With each additional index, we decrease our write speed for a collection. If a document were to change or completely removed, one or more of our b-trees might need to be balanced. This means that we need to be careful when creating indexes. We don’t want to have too many unnecessary indexes in a collection because there would be excessive loss in insert, update, and delete performance. It would help if you had a good idea of what indexes are, their pros and cons, and how they work.

Single Field Index

The simplest index that MongoDB
db.<collection>.createIndex({<field>:<direction>})
Key features:
* The keys are from only one field.
* Can find a single value for the indexed field.
* Can find a range of values.
* Can use dot notation to index fields in subdocuments.
* Can be used to find several distinct values in a single query

Example:
db.getCollection(‘people’).find({“ssn”:”829–00–2162"}).explain(“executionStats”)
{“queryPlanner” : {… },
“winningPlan” : {
“stage” : “COLLSCAN”, …},
“executionStats” : {
“nReturned” : 52,
“executionTimeMillis” : 30,
“totalKeysExamined” : 0,
“totalDocsExamined” : 20267,
“nReturned” : 52,

}

We can see that we use here a collection scan(COLLSCAN), we examined 0 index, 20,267 documents and return 52 documents. The server worked about 30 milliseconds.

let’s add Index to our collection
db.people.createIndex( { ssn : 1 } )

run the same query again
db.getCollection(‘people’).find({“ssn”:”829–00–2162"}).explain(“executionStats”)
{“queryPlanner” : {… },
“winningPlan” : {
“stage” : “FETCH”,
“inputStage” : {
“stage” : “IXSCAN”, …},
“executionStats” : {
“executionTimeMillis” : 2,
“totalKeysExamined” : 52,
“totalDocsExamined” : 52,
“nReturned” : 52,

}

Now with Index scan(IXSCAN), we can see that we examined 52 keys and 52 documents and return 52 documents. The server worked about 2 milliseconds.

Sorting with Index

In memory sorting
The documents in our collections are stored on disk in a new order. The server is going to have to read the documents from disk into RAM. In RAM, it will perform some sorting algorithm on them. Depending on how many documents you have, this might take a long time

Moreover, because sorting many memory documents might be an expensive operation, the server will abort sorting in memory when 32 megabytes of memory are used.

Example:
db.getCollection(‘people’).find({}).sort({ssn:1}).explain(“executionStats”)
{“queryPlanner” : {
“winningPlan” : {
“stage” : “SORT”,
“sortPattern” : { “identity_unique_id” : 1.0 },
“inputStage” : {
“stage” : “SORT_KEY_GENERATOR”,
“inputStage” : {
“stage” : “COLLSCAN”,
“direction” : “forward” … },
“executionStats” : {
“nReturned” : 20267,
“executionTimeMillis” : 202,
“totalKeysExamined” : 0,
“totalDocsExamined” : 20267
}

The server run as memory sorting and execut about 202 miliseceond

Index sorting
The keys are ordered according to the field specified during index creation. The server can take advantage of this via sort. The order of the documents returned is guaranteed to be sorted by the index keys. This means that there is no need to reform an explicit sort, as the documents will be fetched from the server in the sorted order

Example:
let’s add Index to our collection
db.people.createIndex( { ssn : 1 } )
db.getCollection(‘people’).find({}).sort({ssn:1}).explain(“executionStats”)
{“queryPlanner” : {
“winningPlan” : {
“stage” : “FETCH”,
“inputStage” : {
“stage” : “IXSCAN”,
“keyPattern” : { “identity_unique_id” : 1 },
“indexName” : “identity_unique_id_1”,
“direction” : “forward” },
“executionStats” : {
“nReturned” : 20267,
“executionTimeMillis” : 35,
“totalKeysExamined” : 20267,
“totalDocsExamined” : 20267,
}

The server run as index sorting and execut about 35 miliseceond

Compound Indexes

A compound index is an index on two or more fields, and it can support queries based on those fields

Now, in MongoDB, indexes are b-trees, and they have an ordering, and that order is flat. So if you have two fields, you might somehow think that your indexes are two-dimensional in some way, but they’re not. They’re one-dimensional. You can think of index keys as an ordered list.

Index prefix
Index prefixes are the beginning subsets of indexed fields. For example, consider the following compound index:
{ “item”: 1, “location”: 1, “stock”: 1 }
The index has the following index prefixes:
* { item: 1 }
* { item: 1, location: 1 }

For a compound index, MongoDB can use the index to support queries on the index prefixes. As such, MongoDB can use the index for queries on the following fields:
The item field, the item field and the location field, the item field, the field, and the stock field.

MongoDB can also use the index to support a query on item and stock fields since the item field corresponds to a prefix. However, the index would not help the query as an index on only item and stock.

However, MongoDB cannot use the index to support queries that include the following fields since, without the item field, none of the listed fields correspond to a prefix index:
The location field, the stock field, or the location and stock fields.

If you have a collection that has both a compound index and an index on its prefix (e.g. { last_name: 1, first_name: 1 } and {last_name: 1 }), if neither index has a sparse or unique constraint, then you can remove the index on the prefix (e.g. {last_name: 1 }). MongoDB will use the compound index in all situations that would have used the prefix index.

Sort order
Indexes store references to fields in either ascending (1) or descending (-1) sort order. For single-field indexes, the sort order of keys doesn’t matter because MongoDB can traverse the index in either direction. However, for compound indexes, sort order can determine whether the index can support a sort operation.

--

--