Monalize — a MongoDB tool for performance issues scanning

Serhii Kuyanov
MongoDB Cowboys
5 min readOct 19, 2020

--

When we do not use the MongoDB database correctly, we can often face the thing that the server CPU becomes 80% or even busier. This is not very pleasant, and most often people do not understand what the problem is, but simply scaling the server vertically.

Sometimes it is worth looking at the problem from a different angle. In this article we are going to explain how to find the root cause of such issues. And how to do it in an automated way.

The inspiration for this post was the following article:

While reading the ‘Troubleshooting MongoDB 100% CPU load and slow queries’ post, the following question popped up: why don’t we simplify and automate all of these steps? What if we have several different clients with different mongo applications and databases? Should not we just run some ready tool and got the root cause? Therefore, we decided to create a very simple and small tool that will extract the information we need very quickly and without environment settings. And since I just started learning golang, I decided to use it for the tool

Simplicity is prerequisite for reliability” — Edsger Wybe Dijkstra

It is necessary to explain initially - there are many applications for MongoDB scanning and analysis. They need to be installed and understood, but we wanted to make something as simple as possible which will give us all the necessary information.

“There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. “ — Charles Antony Richard Hoare

We called this tool — Monalize.

All we need to do is download the app. Additionally — you can add the app to the PATH.

The following command can be used to download the Monalize tool:

MONALIZE=$( curl --silent "https://api.github.com/repos/MongoDB-Cowboys/Monalize/releases/latest" | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/' ) \
;wget https://github.com/MongoDB-Cowboys/Monalize/releases/download/$MONALIZE/monalize

Monalize is distributed as a single binary. Install Monalize and move it to a directory included in your system’s PATH.

Don’t forget to make it executable:

chmod +x monalize

After that, we can run it.

If use a tool without flags it will scan all the databases at “mongodb://localhost:27017” and iterate over the logs at “/var/log/mongodb/mongodb.log”
Things get a lot more interesting when we start using flags, there are not many of them:

  1. db_uri — set anURL to connect to a remote database.
    Example:
  • mongodb://user:passwd@address:port

2. db_name - pass the flag if you want to scan a specific database only.

3. excel - pass it to produce more output to an Excel file. The file will look like this:

4. logpath — allows you to change the path to the MongoDB log file. Without the container variable, it will be applied to the local log file. (default "")

5. container — Specify a name of Docker container. If you leave the logpath variable empty, logs will be read from the container. (default "")

6. context_timeout — Set context timeout. (default 10). More details can be found here.

7. podman—For using the Podman executable file to check the custom log file in the container.

Now it's time to explain how it all works. Its functions can be divided into 3 stages.

1. This is a database scan. It shows the databases, collections, count of documents, and indexes that are in use.

The result of the command will be output in terminal and will look like the following:

This is an example of a properly indexed database.

Optionally, it can be saved to csv file if an excel argument is provided — results.csv.

2. Search for slow queries happened during the last second.

The result of the command will be output in terminal and will look like the following:

3. MongoDB log file analysis.

It will select all queries that were executed without using proper indexes. The result of the command will be saved in colout.txt file.

Now we need to learn how to use the received data.

For example, it is quite common that each collection has only a default index on the "_id" (primary key) field and no other indexes.

Let’s run a tool and check the colout.txt file. If any queries stats in the output use planSummary: COLLSCAN then it means a query performs a full collection scan and does not use any indexes, which leads to high CPU consumption.

If the MongoDB query planner finds and selects a proper collection index, the planSummary will be IXSCAN which means an index is used and the data is read from RAM instead of the disk which leads to faster execution time and a little of CPU usage.

Now, to fix the issue, it requires to analyze each query from the colout.txt file and add proper indexes. Once it’s done properly, the CPU usage will drop significantly.

Let’s review one of the records from the colout.txt file and define which indexes to add:

2020–09–09T21:43:29.931+0530 [conn3] query exampledb.orgaddress query: { $and: [ { tenantid: “5ed5eede6911f87ea49a221f” }, { status: “Active” } ], ownerid: { $in: [ “5ed60ea16911f87ea49a741f” ] } } planSummary: COLLSCAN ntoreturn:0 ntoskip:0 nscanned:105876 nscannedObjects:105876 keyUpdates:0 numYields:0 locks(micros) r:184116 nreturned:1 reslen:84 184ms

Collection:

orgaddress

Actual query performed:

{
$and: [
{ tenantid: "5ed5eede6911f87ea49a221f" },
{ status: "Active" }
],
ownerid: {
$in: [ "5ed60ea16911f87ea49a741f" ]
}
}

Index to add:

db.orgaddress.createIndex(
{ tenantid: 1, status: 1, ownerid: 1 })

That’s all. We just shown how you can quickly analize the MongoDB instance for the CPU consumption issues and how to apply the required remedies.

P.S. Need more information or consultancy? You can ping khomenkoigor@gmail.com or me sergeyku9nov@gmail.com.

--

--