MongoDb vs Mysql — NodeJs paradigm

Dear reader, you might wonder why bother writing another blog post about MongoDb and MySql. The reasoning hides behind me arguing with NodeJs developers about what data storage to use as the primary data storage for node applications. I have seen lots of comments where developers argue about this matter. Some say: “Use MongoDb, it is faster and better suited for NodeJs apps”, others say: “Use relational db, those who use MongoDb just don’t know how to write a couple of joins”. So I decided to take a step forward and just see it for myself.

Note: do not consider this as a full study of both. This article is intended to share my opinion with you and should not be considered as a decision of using one technology over the other.

Testing environment

So for all this testing I used MongoDb:latest docker container same for MySql. Virtual environment with 4G ram. Processor: 2.5 GHz Intel Core i5. For queries I used a built api with HapiJs. As I was trying to replicate the production like use of the technologies. Native drivers where used for both MongoDb and MySql. No caching enabled.

Testing has been performed on three different models.

  1. Mongo-flat — which is just a flat schema. One collection `Users` with address embedded in the collection
Mongo-flat

2. Mongo-relation — relational schema in Mongo. This involves two collections: `Users` and `Address`

Mongo-relation

3. Mysql — relational schema. Two tables: `Users` and `Address`

Mysql

In general, a `User` has a one-to-many relationship with `Address`.

Benchmarks

  1. Insert - for inserting the records I have used `Faker.js`. Insertion was performed using two methods: row by row and in bulk. Bulk insert in chunks of 5000.

As you can see the definite winner is Mongo-flat bulk inserts. That is understandable because MongoDb was designed to be used as high volume inserts. On the other hand we can see that MySql came last.

2. Add an address to the user

In this instance MySql came first.

3. Deleting a record

Another win for `Mongo-Flat` as the deletion occurs on one collection `Users`. As I mentioned earlier MongoDb was designed for high writes and deletes. On the other hand Mysql can do the job for you in deleting the relations if you use foreign key constrains `ON DELETE …`. This becomes handy in case of complicated relationships in your database. I would not recommend deleting the record all together, please use soft-deletes. In other words, you could rely on the database to do the job for you in MySql instance where MongoDb fails to do so. To delete a relation in `Mongo-relation` you need to perform multiple queries, to get the related collection data first and then propagate the deletion through entire relation tree.

4. Get 5000 users at once

For data retrieval MySql showed better performance. Retrieving the user from MySql took 35 ms this is ~ 14% faster than Mongo-relation and ~ 12% faster than Mongo-flat.

5. Count users

MySql performed the worst, where both Mongo-flat and Mongo-relation showed very good result.

6. Get 5000 users with address

Both Mysql and Mongo-flat performed approximately the same, where Mongo-relation was underperforming; due to the fact that to retrieve data in case of Mongo-relation you need to perform two queries: one to retrieve the user, and another to retrieve its addresses. In case of MySql it’s just a `JOIN` and in case of Mongo-flat just returning a flat object.

7. Get users with no Florida Address

Once again Mysql and Mongo-flat performed approximately same. As you can see Mongo-flat speed starts declining comparing to benchmark #6 due to the fact that there is filtering on `<> ‘Florida’ and {$ne: ‘Florida’}`. Mongo-relation is slow again because in order to perform this kind of query we need to use MongoDb aggregation framework which is slow.

8. Get 5000 addresses

MySql is the winner again, but in this case Mongo-flat comes to the finish line last. The reason is again MongoDb `Aggregation framework`; to retrieve data from Mongo-flat you need to aggregate on the embedded address relation.

9. Count addresses

This benchmark was surprising even for me. As you can see Mongo-flat is much slower then others. And that is again because of the aggregation necessary to count the address embedded in the `Users` collection.

10. Update address by where clause

Mongo-flat performed bad again; as the updates are happening on the embedded addresses.

11. Get all Florida addresses

Once again Mongo-flat is last. Again that is due to the Aggregation Framework.

Conclusion

From all the benchmark shown above I can see that MongoDb performs very well on inserts. That is because MongoDb was designed to be capable of writing a lot of data. So I can conclude that MongoDb will be best suited where you would need a lot of writes such as logging or transitional data.

On the other hand, MySql performs much better for the data retrieval. So if you do not have a lot of data insert or your data insert is occasional I would see MySql as your business data storage for scenarios such as reporting, customer management, etc.

I have seen MongoDb being used as a relational database and from benchmarks I can see that it would be wrong to use it that way, just use MySql or any other Sql based databases, they are designed for this job. But if you fall into the trap, I would try to add relation keys on both sides, such as, going back to Mongo-relation User->Address schema I would suggest adding addresses ids to the User collection, as an embed, as it would be easier in some instances to retrieve the `User` based on address id. Mongo-relation allows fast retrieval per collection base but slows dramatically when you start getting relations, because there is no query joining between collections in mongo. In order to get the relations you need to make multiple calls to the database. Also you can speedup querying by retrieving data in bulk and then joining related collections in your application. Be careful with the last approach as it can make your application un-usable, especially if you use single threaded technologies like NodeJs.

Mongo-flat becomes very slow when you use aggregations. In most instances that is the only choice you might have, especially if you trying to retrieve embedded relations. Aggregation framework is not as robust as Sql language so for some queries you would need to make multiple aggregation calls to the database to achieve the final result. All of this becomes very complicated the deeper nesting of you relation goes.

MySql uses Sql language which is very robust and easy to write, allows joining tables, filtering and joining again; it also incapsulates application logic into the database, for example: your table joining is done on the database level instead of application level.

Can MySql be slow? Yes. But that, in my opinion, is because of poor engineered schema. There are a lot of companies have been using MySql as their primary data storage for years and it showed good benchmarks.

Both Mongo and MySql are great technologies. They both serve their purpose. But should we substitute one for another? Absolutely not. As I wrote earlier MongoDb is good for transitional data, logs, notifications, etc. Where MySql is good for your business data storage, reporting, relational data, etc.

I see fails where MongoDb is used as a relational database and that just blows my mind. Again, poor decisions bring you to failure. Don’t use MongoDb for relational data - that is not MongoDb’s purpose.

I can continue describing both MongoDb and MySql but I will stop here and let you make a decision for yourself. I did my study, you do yours. But once again, every technology serves its purpose.

As a funny fact watch this.