Dynamic aggregations with SparrowQL
If you’ve ever worked with MongoDB aggregation pipeline before, you may have noticed that it’s a very powerful tool when it comes to querying the DB. The version 3.2 introduced $lookup
operator, which allows performing an equivalent of SQL left outer joins. Why is it important? Because with it we may write a single query to get all related data!
I’ve been working with MongoDB aggregations for over 3 years now. During that time, I’ve written a lot of pipelines, many of which include a $lookup
stage to reduce the overhead of joins — it performs better than a series of .find
s and also stays consistent as the DB may change between .find
s.
That’s the origin of SparrowQL. It’s just a little pain in the neck to do such common jobs and thus it’d be nice to get rid of it. The idea occurred to me at the beginning of 2018. The initial version was developed soon afterwards and it’s already being used in production for quite some time now. Let’s have a quick look at what we got here.
First, meet our DB model. Here’s a graph visualization:
And here’s a textual representation:
Blogs contain Posts (
blogId
)
Users own Blogs (ownerId
)
Users write Posts (authorId
)
Now, let’s write it as SparrowQL relations:
const relations = [
{to: 'Blogs', from: 'Posts', foreign: '_id', local: 'blogId'},
{to: 'Users', from: 'Blogs', foreign: '_id', local: 'ownerId'},
{to: 'Users', from: 'Posts', foreign: '_id', local: 'authorId'}
];
Easy, right? Next, fill our DB with some fixtures:
const blogs = [
{_id: 0, ownerId: 3, topic: 'Best blog!'}
];const posts = [
{_id: 0, authorId: 0, blogId: 0, title: 'Being fast for dummies'},
{_id: 1, authorId: 0, blogId: 0, title: 'Being declarative'},
{_id: 2, authorId: 1, blogId: 0, title: 'Amazing!'},
{_id: 3, authorId: 2, blogId: 0, title: 'A sparrow, really?'}
];const users = [
{_id: 0, name: 'SparrowQL', about: 'Declarative aggregations'},
{_id: 1, name: 'Random #1', about: 'Node.js developer'},
{_id: 2, name: 'Random #2', about: 'Bird lover'},
{_id: 3, name: 'BlogOwner', about: 'Owner of the blog'}
];await Blogs.insertMany(blogs);
await Posts.insertMany(posts);
await Users.insertMany(users);
Query time! Run a simple query:
const {build} = require('sparrowql');const projection = {
blogTopic: 'Blogs.topic',
postTitle: 'Posts.title'
};const start = 'Posts';
const pipeline = build({projection, relations, start});
const results = Posts.aggregate(pipeline).toArray();// There is also the _id field, but I'll skip it everywhere.
[ { blogTopic: 'Best blog!', postTitle: 'Being fast for dummies' },
{ blogTopic: 'Best blog!', postTitle: 'Being declarative' },
{ blogTopic: 'Best blog!', postTitle: 'Amazing!' },
{ blogTopic: 'Best blog!', postTitle: 'A sparrow, really?' } ]
Whoa! What happened? Here’s the pipeline
:
[ { $lookup:
{ as: 'SPARROW_JOINED_Blogs',
foreignField: '_id',
from: 'Posts',
localField: 'blogId' } },
{ $unwind:
{ path: '$SPARROW_JOINED_Blogs',
preserveNullAndEmptyArrays: true } },
{ $project:
{ blogTopic: '$SPARROW_JOINED_Blogs.topic',
postTitle: '$title' } } ]
Okay, that’s cool! But what if we want to get the post blog owner name? Using Users.name
won’t work, as it’ll resolve with post author name. In this case, the relations graph forms a DAG (directed acyclic graph), yet SparrowQL will work correctly only on directed trees. In case you are not familiar with these terms, or simply want to know what to do now, here it is: aliases
. Let’s define them, along with new relations
:
const aliases = {
Authors: 'Users',
Owners: 'Users'
};const relations = [
{to: 'Blogs', from: 'Posts', foreign: '_id', local: 'blogId'},
{to: 'Owners', from: 'Blogs', foreign: '_id', local: 'ownerId'},
{to: 'Authors', from: 'Posts', foreign: '_id', local: 'authorId'}
];
It’s a directed tree! Then, we can finally implement our query:
const projection = {
ownerName: 'Owners.name',
postTitle: 'Posts.title'
};const start = 'Posts';
const pipeline = build({aliases, projection, relations, start});
const results = Posts.aggregate(pipeline).toArray();[ { ownerName: 'BlogOwner', postTitle: 'Being fast for dummies' },
{ ownerName: 'BlogOwner', postTitle: 'Being declarative' },
{ ownerName: 'BlogOwner', postTitle: 'Amazing!' },
{ ownerName: 'BlogOwner', postTitle: 'A sparrow, really?' } ]
And here’s a pipeline
:
[ { $lookup:
{ as: 'SPARROW_JOINED_Blogs',
foreignField: '_id',
from: 'Blogs',
localField: 'blogId' } },
{ $unwind:
{ path: '$SPARROW_JOINED_Blogs',
preserveNullAndEmptyArrays: true } },
{ $lookup:
{ as: 'SPARROW_JOINED_Owners',
foreignField: '_id',
from: 'Users',
localField: 'SPARROW_JOINED_Blogs.ownerId' } },
{ $unwind:
{ path: '$SPARROW_JOINED_Owners',
preserveNullAndEmptyArrays: true } },
{ $project:
{ ownerName: '$SPARROW_JOINED_Owners.name',
postTitle: '$title' } } ]
As you can see, SparrowQL has done a good job, and prepared a nice pipeline to fulfill our needs. But what if we don’t need any special logic? Let’s check it with a simple query:
const projection = {
postTitle: 'Posts.title'
};const start = 'Posts';
const pipeline = build({aliases, projection, relations, start});
const results = Posts.aggregate(pipeline).toArray();[ { postTitle: 'Being fast for dummies' },
{ postTitle: 'Being declarative' },
{ postTitle: 'Amazing!' },
{ postTitle: 'A sparrow, really?' } ]
And a simple pipeline
was generated:
[ { $project: { postTitle: '$title' } } ]
That’s not everything! SparrowQL can do a few more things:
- handle
limit
andskip
, which maps to$limit
and$skip
directly, - handle
query
, which maps to$match
directly (*), - handle
sort
, which maps to$sort
directly (*).
(*) They may result in more than one phase.
All of them are easy enough if you understood at least a part of the above. There’s also `computed`, which is an advanced feature, designed to handle a $group
phase.
Let’s add something to our DB and run a complex query, with limit
, skip
, query
and sort
:
await Posts.insertMany([
{_id: 4, authorId: 1, blogId: 0, title: 'Best!'},
{_id: 5, authorId: 1, blogId: 0, title: 'Superb!'}
]);const limit = 1;
const projection = {
blogOwnerName: 'Owners.name',
postAuthorName: 'Authors.name',
postTitle: 'Posts.title'
};const query = {'Authors.name': 'Random #1'};
const skip = 1;
const sort = {'Posts.title': -1};
const start = 'Posts';
const pipeline = build({
aliases,
limit,
projection,
query,
relations,
skip,
sort,
start
});
const results = Posts.aggregate(pipeline).toArray();[ { blogOwnerName: 'BlogOwner',
postAuthorName: 'Random #1',
postTitle: 'Best!' } ]
Which generates a complex pipeline
:
[ { $sort: { title: -1 } },
{ $lookup:
{ as: 'SPARROW_JOINED_Authors',
foreignField: '_id',
from: 'Users',
localField: 'authorId' } },
{ $unwind:
{ path: '$SPARROW_JOINED_Authors',
preserveNullAndEmptyArrays: true } },
{ $match: { 'SPARROW_JOINED_Authors.name': 'Random #1' } },
{ $sort: { title: -1 } },
{ $skip: 1 },
{ $limit: 1 },
{ $lookup:
{ as: 'SPARROW_JOINED_Blogs',
foreignField: '_id',
from: 'Blogs',
localField: 'blogId' } },
{ $unwind:
{ path: '$SPARROW_JOINED_Blogs',
preserveNullAndEmptyArrays: true } },
{ $lookup:
{ as: 'SPARROW_JOINED_Owners',
foreignField: '_id',
from: 'Users',
localField: 'SPARROW_JOINED_Blogs.ownerId' } },
{ $unwind:
{ path: '$SPARROW_JOINED_Owners',
preserveNullAndEmptyArrays: true } },
{ $project:
{ blogOwnerName: '$SPARROW_JOINED_Owners.name',
postAuthorName: '$SPARROW_JOINED_Authors.name',
postTitle: '$title' } } ]
That’d be all for starters. Got hooked with the idea? If so, why not try out SparrowQL in your own project? It only takes minutes to get it up and running while potential profits are significant — less time spent on pipelining queries and boost on performance at once. SparrowQL — no reason to miss out on that little bird!