Refactoring Mongoose with Q
Better API Code. I Promise.
Yikes! Why is it that an engineer who generally writes beautiful, meticulous front end code can write the most busted, nasty data access code on the back end? I have seen it a handful of times, as if getting the data from the server to the client is a total afterthought, and I think it happens because as the needs of the front end expand, the API is left running to catch up.
I took some time to refactor some Mongoose chaos from imperative to promise-oriented, and I was really happy with the end result, both in terms of performance and elegance. Since node promises can be a bit confusing at first (and particularly in the content of Mongoose), I thought I would jot down some of our discoveries in case they prove useful to you.
Before: Spaghetti Code, Unacceptable Performance
Most of our API endpoints were performing fantastically, but 7–8 fell way outside any acceptable range. In several cases, we had the obvious need to push long-running processes to background jobs, but in cases such as the consultant search we had pure data access, and an average of 3.75 seconds is absurd.
The offending code was visually problematic in a way that made it difficult to determine all the things going on.
It turns out:
- Two database queries and a third party API call were being run in series
- In the case of both database queries, Mongoose objects were being instantiated
- The results are paginated, but the database query is retrieving all values and then subsetting afterwards in the code.
- Insult to injury, the subquery is populating companies for all values
Before getting to a refactor, we made sure we had a clean set of known results from the API that we could test against as we made changes.
Step One: Callback Hell…but Clean
For the first step in the refactor, I extracted the query building process to a separate method, and I tried to make the other query inputs more explicit. These included sort and limit. I also extracted the third party call, which turned out was actually performing fine but was written with a lot of code duplication.
The limit restriction was promoted to the database query itself, which had an immediate and obvious performance impact.
We also cleaned up the Mongoose syntax from a relatively scattered set of code to simple chainable commands: e.g. Model.find(…).where(…).sort(…).limit(…).exec().
The code was still callback hell, but at least it was clear what was going on.
Step Two: Promises
Not only is code a lot flatter after moving to node promises, but it is a lot easier to be semantic about the steps going on.
In the above case, we have separated the database queries from the processing logic and the final execution of the queries. Comparing this with our original code, it is much more immediately obvious what each line of code is achieving.
In addition, the “smell” within the callbacks was that each callback function achieved two things. First it processed the output of the parent function, and then it called the subsequent function. After refactor, each function is only responsible for itself.
On the other hand, separating the retrieval from the processing seems unnecessary. Since Mongoose promises return a promise with .then() you can happily chain each thusly:
Step Three: Parallel Queries with Q.All
The only issue now is that we have three queries running in series that do not rely on each other, so even better if they ran in parallel.
We use Q.all for this elsewhere, and while I initially thought I would need some version of mongoose-q, Q.all worked fine, consuming an array of Mongoose promises.
Even better, Q.all bubbles errors up naturally, allowing us to catch errors in one place.
Summary: The New Pattern
First, the performance implication: 513 ms down from 3,750. That’s an improvement of 86%. Not bad!
None of this process is earth-shattering, but it yields a simple pattern for cleaning up and refactoring the rest of our API code.
- Isolate query construction logic until you get to the point where you can elegantly use the chainable syntax Mongoose provides: e.g. Model.find(conditions).sort(sort).limit(limit).exec(…)
- Where multiple dependent asynchronous functions are required, use Mongoose promises to chain functions with .then()
- Where multiple independent asynchronous functions are required, use Q.all to run them in parallel
Boom. Hope this helps!