MongoEngine’s index verification

Astitva Shah
Unibuddy
Published in
2 min readFeb 17, 2022

At Unibuddy, we have been using mongoengine as an ODM wrapper over pymongo. We recently discovered that every time we inserted a document, mongoengine would initially check if all indexes defined in our collection were present in the database and after this verification, fire the actual INSERT command.

Consider the following scenario, as an example:

class User(Document):
email = StringField()
name = StringField()
age = IntField()
meta = {
"indexes": [
"email",
"age"
]
}
sam = User.objects.create(email="sam@example.com", name="Sam", age=1)

Assuming that the indexes for User collection already exist before creating user : sam, when the above create statement is executed, the following commands are received on the mongo server: (Highlighted important info with bold)

{"order":21,"op":"op_msg","command":"createIndexes","ns":"test1","request_data":{"sections":[{"payload":{"$db":"test1","$readPreference":{"mode":"primary"},"createIndexes":"user","indexes":[{"background":false,"key":{"email":1},"name":"email_1"}],"lsid":{"id":{"$binary":"LPjyRMPhQyebVjhgyGKxHw==","$type":"04"}}},"payloadType":0}]},"connection_num":2,"seen":"2021-04-30T15:27:29.983394+05:30","request_id":1457850878}
{"order":22,"op":"op_msg","command":"reply","reply_data":{"sections":[{"payload":{"note":"all indexes already exist","numIndexesAfter":3,"numIndexesBefore":3,"ok":1.0},"payloadType":0}]},"connection_num":2,"latency_us":223,"seen":"2021-04-30T15:27:29.983617+05:30","request_id":1457850878}
{"order":23,"op":"op_msg","command":"createIndexes","ns":"test1","request_data":{"sections":[{"payload":{"$db":"test1","$readPreference":{"mode":"primary"},"createIndexes":"user","indexes":[{"background":false,"key":{"age":1},"name":"age_1"}],"lsid":{"id":{"$binary":"LPjyRMPhQyebVjhgyGKxHw==","$type":"04"}}},"payloadType":0}]},"connection_num":2,"seen":"2021-04-30T15:27:29.984624+05:30","request_id":1458777923}
{"order":24,"op":"op_msg","command":"reply","reply_data":{"sections":[{"payload":{"note":"all indexes already exist","numIndexesAfter":3,"numIndexesBefore":3,"ok":1.0},"payloadType":0}]},"connection_num":2,"latency_us":154,"seen":"2021-04-30T15:27:29.984778+05:30","request_id":1458777923}
{"order":25,"op":"op_msg","command":"insert","ns":"test1","request_data":{"sections":[{"payload":{"$db":"test1","$readPreference":{"mode":"primary"},"insert":"user","lsid":{"id":{"$binary":"LPjyRMPhQyebVjhgyGKxHw==","$type":"04"}},"ordered":true,"writeConcern":{"w":1}},"payloadType":0},{"payload":{"documents":[{"_id":{"$oid":"608bd4896ee5c54481d3f6d3"},"age":1,"email":"a@example.com","name":"a"}],"identifier":"documents","size":82},"payloadType":1}]},"connection_num":2,"seen":"2021-04-30T15:27:29.985967+05:30","request_id":2007237709}
{"order":26,"op":"op_msg","command":"reply","reply_data":{"sections":[{"payload":{"n":1,"ok":1.0},"payloadType":0}]},"connection_num":2,"latency_us":1192,"seen":"2021-04-30T15:27:29.987159+05:30","request_id":2007237709}

Observation 🤨

As you can see, there are three commands received on the server for one CREATE call. The first two commands are checking for the existence of indexes: email and age and the third command is the actual insert command.

Inference 🤔

The above observation tells us that if we have N indexes defined for a collection, there would be N+1 commands sent to mongo. This is the default behaviour for any collection defined with mongoengine. Suppose we had a highly write-intensive application, we would see some serious impact on the database performance with the way we insert data. Would it be worth considering whether we want to check for indexes on every insert operation?

Solution💡

If you are designing a write-intensive application, it would make more sense to think of indexes during the database design stage and create them upfront even before deploying your app. With this setup, you can ask mongoengine to not create indexes automatically by specifying auto_create_index as False in the model's meta dictionary, like so:

class User(Document):
email = StringField()
name = StringField()
age = IntField()
meta = {
"auto_create_index": False
"indexes": [
"email",
"age"
]
}

You also have an explicit way of creating indexes from within the app using:

User.create_index(keys, background=False, **kwargs)

This can be used as a one-time solution when the app loads and not necessarily on each insert request.

Conclusion

Indexes are one of the most crucial parts of any database design. As the size of your collections becomes large, indexes come to the rescue for things like efficient queries to the database, in-memory sorting, etc. Therefore, understanding the what, when, and how of index creation is very important. We’ll delve deeper into the importance of indexes in a future post.

References:

https://docs.mongoengine.org/apireference.html?highlight=auto_create_index#mongoengine.Document.create_index

https://docs.mongodb.com/v4.0/tutorial/build-indexes-on-replica-sets/

--

--