Lessons learned using MongoDB
By Jon Vines, Software Development Team Lead
Welcome to the fourth post in the series discussing some small lessons learned whilst developing on AWS. For the context of the series and what we’re looking to achieve, please read the introductory post An introduction to some small lessons learned developing on AWS.
This post is going to focus on MongoDB. First, we’re going to take a quick look at what MongoDB and MongoDB Atlas are. We’re then going to look at some of the lessons we’ve learned whilst developing applications using MongoDB in dotnet Core. We learned some of these lessons through having a MongoDB consultant on site. This hugely accelerated our learning and understanding of MongoDB. Whilst there was nothing outrageous we were doing, there were enough small things to make a dramatic difference.
What is MongoDB
MongoDB is a NoSQL database that stores data in a JSON like schema. It provides a very flexible data model which means fields can vary from document to document and the data structure can change over time. This approach to the document model also means it is very easy to map to objects in our application code. For us, this made it really convenient for building our application using C# and provided a very low ramp-up time to get early results.
{
"name": "Jon Vines",
"position": "Software Engineer",
"company": "AO.com"
}
Further to MongoDB, we decided to use the fully managed Database as a Service offering MongoDB Atlas. This fits very well with our team philosophy of having as few servers to manage as possible. MongoDB Atlas gave us the ability to build our application and worry less about a lot of the operational concerns that come with managing a database including setup, patching, monitoring and live backup with a point in time restore.
Lessons learned
When we first started using MongoDB, we had just moved away from using another NoSQL database offering. The main reason we moved to MongoDB was that of the C# driver and the ease of getting up and running. One of the first lessons we learned was that it is super easy to get data into MongoDB. The challenge that comes with this is you have to really think about the data models you are pushing into the database. A wrong decision early on can lead to lengthy migrations running in the future.
One pattern to watch for is the unbounded growth of the documents stored in MongoDB. This happens when you continuously add new data to an array within the object, such as new orders placed by a customer. Allowing this to happen can lead to slow queries as the objects get larger. You may also run the risk of objects nearing, or exceed the 16Mb document size limit.
It’s also important to adhere to standards in naming in both the application and the database. We can do this by using the CamelCase convention in our C# startup code.
var pack = new ConventionPack {
new CamelCaseElementNameConvention()
};
This allows us to map from Pascal Case in our C# objects to CamelCase in our MongoDB documents. This is just one example of conventions you can use with MongoDB, others include andIgnoreIfDefaultConvention()IgnoreExtraElementsConvention()
.
As a lot of our data was sourced from SQL, we could make use of the SQL id fields for querying. Initially, this was in a separate field and would be along the line of orderId or addressId. To enable common repositories to access this data, we made use of field level attributes to specify a common name. This can be seen in the format of [BsonElement("docid")]
. We later learned that we could have just set default MongoDB document ID as the identifier. This would have served two purposes. Firstly, we are using the built-in id field within MongoDB which is stored under a common name. Secondly, we reduce the amount of space required for indexes used by MongoDB.
Lessons in paging data
MongoDB performance paging data using the skip and take commands can be quite bad and is even referenced in the cursor.skip()
documentation. Due to a high number of documents we needed to process, we needed to invoke some form of paging. To do this, we had to introduce a range query which orders the documents by a given id and returns the first fifty results from that point.
This helped to improve us to improve our query execution by a huge amount. Perhaps our biggest lesson with implementing the paging solution was how we go about debugging our query execution plans. This is where good log review came in really handy. By default, MongoDB logs any queries with an execution time greater than 100ms. This can be a good indicator of where slow queries are being run. In our case, it all pointed back to the query we were using to page data back from MongoDB.
Nobody really enjoys poring over logs looking for patterns. A really useful tool to get a quick visualisation of slow queries can be obtained using mtools mlogvis.
Lessons with Querying
One of the best features of MongoDB is that it is super quick for both read and write operations. This comes with the caveat that this speed depends on setting your indexes correctly. One early command to get comfortable with is the .explain()
command when constructing your queries. This outputs a lot of useful information, including the number of documents scanned, the length of execution and the indexes considered and rejected to run the query. This can provide a great asset when constructing your MongoDB queries early on.
It’s worth noting when constructing your commands in the C# driver it automatically converts them to the aggregation framework queries in the background. This means you may not be using the most efficient query and could be sacrificing speed for code readability. Consider the following example:
The C# code described above resulted in the following command being invoked against the database:
db.order.explain('executionStats').aggregate([{ $match: { numberOfTimesProcessed: { $lt: 3 }, Id: { $gte: 19020680 } } }, { $sort: { _entityId: 1 } }, { $limit: 50 }], {hint: { Id: 1 }}
Whilst at first view there doesn’t seem to be a lot wrong here, the nature of the query resulted in us scanning over 100,000 documents and a slow query of over 100ms. This happened because the $limit
was not applied to the match stage. The answer was to revert to using the native MongoDB query language instead of relying on LINQ. This allowed us to take control of the query command being executed. The re-written function can be viewed here:
This had a dramatic impact on our query performance, with finds coming back between 3–10ms rather than the 100ms or so we were seeing previously.
Lessons Learned in Monitoring
It would be remiss not to talk about some of the lessons we learned about monitoring MongoDB via the Atlas interface. The built-in alerting and monitoring dashboards have been a great asset to the team whilst building out our application. Again, specifically around query optimisation. There are two graphs in particular that can be used to indicate slow query performance
Both of these graphs show a similar picture. Prior to the 12th of June, there are some interesting behaviours. This was the profile of our queries before optimisation. This clearly shows how loosely targeted our queries were. You can use these graphs as a trigger to investigate the logs as described earlier in the post.
On the query executor chart, there can be some valid reasons for it not to be flat. This chart can indicate a problem if the green line is higher than the blue. This means that we cannot satisfy the query using an index and have to bring documents into memory to examine them.
The goal of the second graph is to be as flat as possible. The green line here is showing the ratio of the number of documents scanned against the number of documents returned by queries.
Looking forward
MongoDB has become a fundamental piece in our architecture jigsaw, but we are always looking to improve. One awesome opportunity we can see is utilising the oplog and streaming updates within the MongoDB collections into Kafka using the Debezium MongoDB connector. We’re conscious of not building a monolithic data layer supporting numerous MicroService applications. By publishing data to Kafka, we can build independent, resilient MicroServices with their own view of the data, be that in MongoDB, S3, SQL or anywhere else. This will allow us to scale, create decoupled, autonomous services, and improve resiliency in the architecture.
Conclusions
We’ve been really pleased with our decision to use MongoDB Atlas to provide the materialised view of our data. Utilising Atlas has allowed us to focus on what we’re best at, building products our business can use to deliver value. It’s helped us increase our flow from left to right and deploy to production early and often.
The major takeaway I’d offer from this is to really focus on document format and query optimisation. These two things will go a long way to keeping you a happy MongoDB user and are fundamental in delivering the fast performance MongoDB can offer.
The major takeaway I’d offer from this is to really focus on document format and query optimisation. These two things will go a long way to keeping you a happy MongoDB user and are fundamental in delivering the fast performance MongoDB can offer.