Benchmarking FastAPI and MongoDB Options
When it comes to Python, MongoDB and FastAPI, you have two main options: PyMongo, the official Python driver for MongoDB, or Motor, the asynchronous Python driver for MongoDB.
But, what’s the most performant way to use these libraries, and does Motor provide better performance than PyMongo? This blog post attempts to answer these questions with benchmarking.
The Options
First up, let’s consider three options for connecting to MongoDB:
Option 1: Use PyMongo and create a new MongoClient for each new request.
- Option 2: Use PyMongo, but this time create a single MongoClient and re-use it for all requests.
- Option 3: Use Motor, making sure to leverage its async capabilities.
Right up front, I should say that in my initial experiments with MongoDB, I went with Option 1. However, if you dig into the PyMongo FAQ, you can find that the PyMongo MongoClient actually provides a built-in connection pool. Not only that, but you can also find this nugget:
Create [the MongoClient] once for each process, and reuse it for all operations. It is a common mistake to create a new client for each request, which is very inefficient.
So, in Option 1, we go with this “common mistake”, and in Option 2, we go with the recommended solution described in the PyMongoFAQ. In option 3, we opt for Motor. Since Motor is based on async, my hunch was that it would provide better overall performance than PyMongo. But, I found this thread on stackoverflow, which observed that Motor was actually slower than PyMongo, along with this explanation:
Perhaps the problem is due to the fact that the motor is not an asynchronous driver in fact. It just starts the synchronous pymongo in ThreadPoolExecutor, hence the performance drop.
Is this true? Let’s see with an experiment!
The Experimental Setup
To recap, our goal is try out three MongoDB options, and determine which provides the best overall performance.
Here are the components of my experiment.
First, I chose to use the free tier of MongoDB Cloud Atlas. MongoDB Atlas enables you to load sample databases, and I chose to build the simplest possible API around the sample movie database. Specifically, my endpoint takes a single movie genre, and returns the titles of the first 100 matching movies.
Second, I wrote three versions of the code. Here is the code for each option:
Third, I set up two Linode instances. On the first instance, I ran the FastAPI code. On the second, I ran a benchmark tool. For benchmarking, I chose autocannon, an easy-to -use HTTP benchmarking tool written in node. For each option, I ran autocannon three times. And, in my initial run, I set autocannon to make 1000 requests with 25 concurrent connections.
Here are my results:
It doesn’t take a rocket scientist to see that Option 1 is way slower. For the 1000 requests I sent, option 1 took an average of 93.77 seconds. By contrast, option 2 took an average of 10.38 seconds. That’s over 9 times faster!
Lesson #1: Follow the advice of the PyMongo FAQs: create one MongoClient for each process, and reuse it for all operations!
What about Motor? Option 3 actually took an average of 10.71 seconds, a tiny bit slower than option 2 and directly in line with the stackoverflow post I referenced above, which also found Motor slower than PyMongo. Just to give Motor another shot, I tried autocannon one more time, this time for 10K requests and 250 concurrent requests. This time, the PyMongo option came in at 100.18 seconds, and Motor came in at 100.52 seconds.
Lesson #2: Just because Motor is an async library, don’t assume that it’s going to deliver greater performance. It’s possible that Motor can deliver better overall performance in some situations, but make sure that you do your own benchmarking to verify.
Hopefully, this blog post gave you some insight into Python MongoDB options. Better yet, hopefully it provided a framework to do your own benchmarking in the future.
Happy coding.