NODEJS SERIES

MongoDB/Mongoose Query Optimizations

Chikku George
Globant
Published in
10 min readFeb 7, 2023

--

MongoDB is a document-oriented NoSQL database. Documents in MongoDB replace the rows of traditional relational data models. Developers can work with changing data models due to MongoDB’s flexibility. MongoDB has grown to become one of the most popular NoSQL databases.

Mongoose is an Object Data Modelling (ODM) library used in conjunction with MongoDB. It allows developers to write code in javascript that will then interface with the database, allowing for speedy and simple application development. Like Express provides an abstraction layer over Node, Mongoose does the same for standard MongoDB.

This article will go through how we may improve the performance of our application by implementing certain query optimizations on MongoDB/Mongoose.

Why is Query Optimisation important?

The performance of a REST API is heavily dependent on the performance of database queries. If a query takes a long time to process the data, the API response time increases, affecting the application’s performance and user experience.

Optimizing queries aims to provide fast, effective access to the database. Particularly with complex database query statements, query optimization is crucial. Developers should constantly be aware of how the database processes queries and always think about retrieving the least amount of data possible.

Let’s start the NodeJs server with a certain Mongoose schema before we begin optimizing. The steps are listed below.

  1. Create a package.json file by running the command npm init -y
  2. Install the dependencies required to set up the server, such as express, body-parser, dotenv, mongoose, and autocannon. Autocannon is optional as it is used only for load testing.
npm i --save express body-parser dotenv mongoose

npm i --save-dev autocannon // autocannon is used for load testing purpose

3. Create a .env file that contains environment variables like PORT and DB_CONNECT_URI(MongoDB connection URI)

PORT = 8080
DB_CONNECT_URI = "MongoDB Connection URI"

4. Create a model.js file that should contain your schema definitions.

const mongoose = require("mongoose");

const userSchema = new mongoose.Schema(
{
name: String,
email: String,
isActive: Boolean,
},
{ collection: "users" }
);

const blogSchema = new mongoose.Schema(
{
title: String,
publication: String,
author: String,
},
{ collection: "blogs" }
);

const User = mongoose.model("users", userSchema);
const Blog = mongoose.model("blogs", blogSchema);

module.exports = { User, Blog };

5. Create an index.js file to start the server with connecting database.

require("dotenv").config();
const express = require("express");
const mongoose = require("mongoose");

const { User, Blog } = require("./model");
const app = express();

const { PORT, DB_CONNECT_URI } = process.env;
const port = PORT || 8080;

mongoose
.connect(DB_CONNECT_URI)
.then(console.log("connected"))
.catch(console.log);

app.listen(port, async () => {
console.log(`Server started at port ${port}`);
});

6. The performance of queries with or without optimization can be compared using autocannon. So create an autocannon.js file containing the following snippet that url will carry the endpoint we need to test.

"use strict";
const autocannon = require("autocannon");

function test (url){
const instance = autocannon({
url,
connections: 10,
pipelining: 1,
duration: 10,
});

autocannon.track(instance, { renderProgressBar: false });
}

test(url);

You can refer to the article below if you are unsure how to use Autocannon.

Query Optimisations

Let’s talk about query optimizations that can speed up database access. The optimizations are detailed below.

1. Lean Queries

Mongoose queries, by default, return instances of the Mongoose document class. This document is quite large as it contains getters/setters and validations. Mongoose recommends using the lean operator with all GET requests to improve performance. Adding .lean() at the end of GET queries causes them only to return simple Javascript objects, not Mongoose documents. Simple Javascript objects use less memory. We utilize the lean operator when we don’t need to change the query results. We can avoid using .lean() if we need to modify the query results or use features like save() or getter/setter.

The query’s syntax should be as follows:

await Model.find().lean()

Testing Lean Queries’ Performance

Add two GET methods with and without .lean() queries on the index.js file.

app.get("/getAll", async (req, res) => {
const users = await User.find({});
res.status(200).send(users);
});

app.get("/getAllWithLean", async (req, res) => {
const users = await User.find({}).lean();
res.status(200).send(users);
});

Invoke the test() method in autocannon.js file with the above GET endpoints to obtain the performance result.

The performance result of the query without adding .lean() is given below.

Autocannon Performance Result for GET query without .lean()

The performance result of the query by adding .lean()is given below.

Autocannon Performance Result for GET query with .lean()

If we compare the test results from the two scenarios, using .lean() queries reduces latency. Additionally, the throughput rate is boosted by lean queries because they can handle more requests per second.

2. Projection or using .select()

Mongoose GET queries always return every field in the related document, which can be a substantial number of fields. Mongoose projection enables us to pick only the fields we need. A document may frequently include numerous fields, including arrays and embedded documents, which we occasionally may not need. For instance, if the only fields required are username and email, we may return those two and exclude the other fields. Mongoose offers a .select() operator that allows us to specify the fields to return. This modification may even help to improve the performance of our application.

Testing Projection’s Performance

Add two GET methods with and without .select() in the index.js file.

app.get("/getAll", async (req, res) => {
const users = await User.find({});
res.status(200).send(users);
});

app.get("/getAllWithProjection", async (req, res) => {
const users = await User.find({}).select('name'); // returns only name field
res.status(200).send(users);
});

Invoke the test() method in autocannon.js file with the above GET endpoints to obtain the performance result.

The performance result of the GET query without adding .select() is given below.

Autocannon Performance Result for GET query without .select()

The performance result of the GET query with .select() is given below.

Autocannon Performance Result for GET query with .select()

Looking at the test results, it is clear that .select() reduces average latency to 5ms. Additionally, they handled 55k requests in 11 seconds, which is faster than queries without .select()

3. Parallel DB Operations

Many developers use async/await for asynchronous operations. If we have two database operations independent of each other, we don’t need to wait for the initial one to finish. For instance, if there are two databases save operations, it is a typical mistake to code as follows:

await Model1.save();
await Model2.save();

If the above two operations are self-contained, we don’t need to wait for the first to finish. The two operations can be conducted simultaneously, as shown below:

await Promise.all([Model1.save(), Model2.save()]);

Testing Parallel DB Operations’ Performance

Add two POST methods as below, which shows the above difference.

app.post("/save", async (req, res) => {
const user = new User({
name: "Ana Varela",
email: "ana.varela@gmail.com",
isActive: true,
});
const blog = new Blog({
title: "Mongoose Query Optimizations",
author: "Chikku George",
publication: "Medium",
});
await user.save();
await blog.save();
res.status(201).send({ message: "Data saved successfully.", code: 201 });
});

app.post("/saveParallely", async (req, res) => {
const user = new User({
name: "Ana Varela",
email: "ana.varela@gmail.com",
isActive: true,
});
const blog = new Blog({
title: "Mongoose Query Optimizations",
author: "Chikku George",
publication: "Medium",
});
await Promise.all([user.save(), blog.save()]);
res.status(201).send({ message: "Data saved successfully.", code: 201 });
});

Invoke the test() method in autocannon.js file with the above POST endpoints to obtain the performance result.

The performance result of the POST query without using parallel DB operations is given below.

Autocannon Performance Result for POST without using parallel db operations

The performance result of the POST query using parallel DB operations is given below.

Autocannon Performance Result for POST using parallel db operations

Using the test findings as a comparison, parallel database operations can cut the average latency time to 2.54ms. Additionally, the throughput rate has been raised to 37k requests per 11.01s.

4. Indexing

Indexes are unique data structures that contain the value of a given field or a set of fields and are ordered by their value. With an index, MongoDB can reduce the number of documents to scan and enhance query performance. In the absence of indices, MongoDB reads each document in a collection to identify those that match the query.

MongoDB creates an index on the _id parameter by default. On any field, though, we can specify an index if we choose. Additionally, we can define compound indexes on various fields in our schema. For frequent requests, we must establish an index with the necessary fields.

We can set the index at the field level or schema level.

// Setting index on schema level. 
// Assign either "1"(sort by ascending) or "-1"(sort by descending) to the field.
schema.index({field: 1 or -1})

Testing Index Performance

Add a GET method to fetch the user data by email as follows:

app.get("/getByEmail/:email", async (req, res) => {
const { email } = req.params;
const response = await User.findOne({ email });
res.status(200).send(response);
});

Invoke the test() method in autocannon.js file with the above GET endpoint to obtain the performance result.

The performance result of the GET query without indexing is given below.

Autocannon Performance Result of GET method without Indexing

Add an index on the email field of User schema as follows and invoke the GET method using autocannon.

const mongoose = require("mongoose");

const userSchema = new mongoose.Schema(
{
name: String,
email: {
type: String,
index: 1 //set an index on field level
},
isActive: Boolean,
},
{ collection: "users" }
);

userSchema.index({email: 1}); //set an index on schema level

module.exports = { User, Blog };

The performance result of the GET query with indexing is given below.

Autocannon Performance Result of GET Method with Indexing

The test results show that indexing decreased the latency time. Additionally, indexing could handle 67k requests in 11.01 seconds, which is quicker than without indexing.

5. Aggregate Pipeline Stages

In an aggregate pipeline, the order in which the queries are executed is crucial since it affects how quickly they run.

Consider the below two schemas in the model.js file. movie_id in the “comments” collection is a reference to the “movies” collection.

const commentsSchema = new mongoose.Schema(
{
name: String,
email: String,
movie_id: [{
type: mongoose.Schema.Types.ObjectId,
ref: "movies"
}],
text: String,
date: Date,
},
{ collection: "comments" }
);
const movieSchema = new mongoose.Schema(
{
plot: String,
cast: [String],
title: String,
released: Date,
},
{ collection: "movies" }
);
const Comments = mongoose.model("comments", commentsSchema);
const Movies = mongoose.model("movies", movieSchema);

module.exports = { User, Blog, Comments, Movies };

Let the query is to fetch all the movies commented on by an user with the email “john_bishop@fakegmail.com”. For this, we need to aggregate two collections, “movies” and “comments”, and keep all the movies commented on by “john_bishop@fakegmail.com”.

In this situation, the aggregate pipelines can be ordered in one of two ways:
1. In the first instance, we’ll look up the movies table with the movie_id reference. It will unwind all the movies. Later we will filter with the email “john_bishop@fakegmail.com”.
2. Only “john_bishop@fakegmail.com” emails will be filtered in the second scenario, reducing the number of records to look up in the “movies” collection. Then we’ll look up the “movies” collection and fetch all the movies’ information.

Testing the first type of aggregation

Add a GET method to fetch the data as follows. The order of aggregate pipelines will be $lookup followed by $match

app.get("/comments", async (req, res) => {
try {
const response = await Comments.aggregate([
{
$lookup: {
from: "movies",
localField: "movie_id",
foreignField: "_id",
as: "movie",
},
},
{
$unwind: {
path: "$movie",
},
},
{
$match: {
email: { $eq: "john_bishop@fakegmail.com" },
},
},
]);
res.status(200).send(response);
} catch (err) {
console.log(err);
}
});

Invoke the test() method in autocannon.js file with the above GET endpoint to obtain the performance result.

The performance result of the first aggregate pipeline is given below.

Autocannon Performance Result of the first type of aggregation

Testing the second type of aggregation

Add a GET method to fetch the data as follows. The order of the pipeline is changed in this example. $match will be performed before $lookup.

app.get("/comments-optmization", async (req, res) => {
try {
const response = await Comments.aggregate([
{
$match: {
email: { $eq: "john_bishop@fakegmail.com" },
},
},
{
$lookup: {
from: "movies",
localField: "movie_id",
foreignField: "_id",
as: "movie",
},
},
{
$unwind: {
path: "$movie",
},
},
]);
res.status(200).send(response);
} catch (err) {
console.log(err);
}
});

Invoke the test() method in autocannon.js file with the above GET endpoint to obtain the performance result.

The performance result of the second aggregate pipeline is given below.

Autocannon Performance Result of the first type of aggregation

Considering the latency time and throughput rate of the examples above, the query performance can be enhanced by changing the order of the phases in the aggregate pipeline.

The optimized order of aggregation pipeline stages is given below:

Aggregation Pipeline Stages Optimisation (Credits: Several9s)

Summary

When it comes to running database queries, performance is always important. Even in today’s modern big data world, achieving high performance is a major difficulty. The options for mongoose query optimization were merely covered in this article. Utilize them to achieve the desired effects more quickly.

--

--

Chikku George
Globant

Software Engineer | ReactJS | NodeJS | Blockchain Enthusiast