MongoEngine’s Index Verification

Astitva Shah
Unibuddy Technology Blog
2 min readJun 24, 2021
Photo by Jan Antonin Kolar on Unsplash

At Unibuddy, we have been using mongoengine as an ODM wrapper over pymongo. Recently, while debugging a use case related to index creation for one of our collections, we observed that every time we were inserting a document, mongoengine would first check if all indexes defined in our collection were present in the database and after this verification, fire the actual INSERT command. Consider the following scenario, as an example:

class User(Document):
email = StringField()
name = StringField()
age = IntField()
meta = {
"indexes": [
"email",
"age"
]
}
sam = User.objects.create(email="sam@example.com", name="Sam", age=1)

Assuming that the indexes for User collection already exists before creating user : sam, when the above create statement is executed, the following commands are received on mongo server: (Highlighted important info with bold)

{"order":21,"op":"op_msg","command":"createIndexes","ns":"test1","request_data":{"sections":[{"payload":{"$db":"test1","$readPreference":{"mode":"primary"},"createIndexes":"user","indexes":[{"background":false,"key":{"email":1},"name":"email_1"}],"lsid":{"id":{"$binary":"LPjyRMPhQyebVjhgyGKxHw==","$type":"04"}}},"payloadType":0}]},"connection_num":2,"seen":"2021-04-30T15:27:29.983394+05:30","request_id":1457850878}
{"order":22,"op":"op_msg","command":"reply","reply_data":{"sections":[{"payload":{"note":"all indexes already exist","numIndexesAfter":3,"numIndexesBefore":3,"ok":1.0},"payloadType":0}]},"connection_num":2,"latency_us":223,"seen":"2021-04-30T15:27:29.983617+05:30","request_id":1457850878}
{"order":23,"op":"op_msg","command":"createIndexes","ns":"test1","request_data":{"sections":[{"payload":{"$db":"test1","$readPreference":{"mode":"primary"},"createIndexes":"user","indexes":[{"background":false,"key":{"age":1},"name":"age_1"}],"lsid":{"id":{"$binary":"LPjyRMPhQyebVjhgyGKxHw==","$type":"04"}}},"payloadType":0}]},"connection_num":2,"seen":"2021-04-30T15:27:29.984624+05:30","request_id":1458777923}
{"order":24,"op":"op_msg","command":"reply","reply_data":{"sections":[{"payload":{"note":"all indexes already exist","numIndexesAfter":3,"numIndexesBefore":3,"ok":1.0},"payloadType":0}]},"connection_num":2,"latency_us":154,"seen":"2021-04-30T15:27:29.984778+05:30","request_id":1458777923}
{"order":25,"op":"op_msg","command":"insert","ns":"test1","request_data":{"sections":[{"payload":{"$db":"test1","$readPreference":{"mode":"primary"},"insert":"user","lsid":{"id":{"$binary":"LPjyRMPhQyebVjhgyGKxHw==","$type":"04"}},"ordered":true,"writeConcern":{"w":1}},"payloadType":0},{"payload":{"documents":[{"_id":{"$oid":"608bd4896ee5c54481d3f6d3"},"age":1,"email":"a@example.com","name":"a"}],"identifier":"documents","size":82},"payloadType":1}]},"connection_num":2,"seen":"2021-04-30T15:27:29.985967+05:30","request_id":2007237709}
{"order":26,"op":"op_msg","command":"reply","reply_data":{"sections":[{"payload":{"n":1,"ok":1.0},"payloadType":0}]},"connection_num":2,"latency_us":1192,"seen":"2021-04-30T15:27:29.987159+05:30","request_id":2007237709}

Observation 🤨

As you can see, there are 3 commands received on the server for 1 CREATE call. The first two commands are checking for the existence of indexes: email and age and the third command is the actual insert command.

Inference 🤔

The above observation tells us that if we have N indexes defined for a collection, there would be N+1 commands sent to mongo. This is the default behaviour for any collection defined with mongoengine. Imagine, if we have a highly write-intensive application - we would see some serious impact on the database performance with the way we insert data. It is worth considering whether we even want to check for indexes on every insert operation?

Solution💡

If you are designing a write-intensive application, it would make more sense to think of indexes during database design stage and create them upfront even before deploying your app. With this setup, you can ask mongoengine to not create indexes automatically by specifying auto_create_index as False in the model's meta dictionary, like so:

class User(Document):
email = StringField()
name = StringField()
age = IntField()
meta = {
"auto_create_index": False
"indexes": [
"email",
"age"
]
}

You also have an explicit way of creating indexes from within the app using:

User.create_index(keys, background=False, **kwargs)

This can be used as a one time thing when the app loads and not necessarily on each insert request. 🙂

Conclusion

Indexes are one of the most crucial parts of any database design. As the size of your collections becomes large, indexes come to the rescue for things like efficient queries to the database, in-memory sorting etc. Therefore, understanding the what, when, how of index creation is very important. We’ll talk more in-depth about indexes in a future post.

References:

https://docs.mongoengine.org/apireference.html?highlight=auto_create_index#mongoengine.Document.create_index

https://docs.mongodb.com/v4.0/tutorial/build-indexes-on-replica-sets/

--

--