Dynamic aggregations with SparrowQL

Radosław Miernik
Vazco
Published in
5 min readOct 3, 2018

--

If you’ve ever worked with MongoDB aggregation pipeline before, you may have noticed that it’s a very powerful tool when it comes to querying the DB. The version 3.2 introduced $lookup operator, which allows performing an equivalent of SQL left outer joins. Why is it important? Because with it we may write a single query to get all related data!

I’ve been working with MongoDB aggregations for over 3 years now. During that time, I’ve written a lot of pipelines, many of which include a $lookup stage to reduce the overhead of joins — it performs better than a series of .finds and also stays consistent as the DB may change between .finds.

SparrowQL Logo

That’s the origin of SparrowQL. It’s just a little pain in the neck to do such common jobs and thus it’d be nice to get rid of it. The idea occurred to me at the beginning of 2018. The initial version was developed soon afterwards and it’s already being used in production for quite some time now. Let’s have a quick look at what we got here.

First, meet our DB model. Here’s a graph visualization:

Database model graph

And here’s a textual representation:

Blogs contain Posts (blogId)
Users own Blogs (ownerId)
Users write Posts (authorId)

Now, let’s write it as SparrowQL relations:

const relations = [
{to: 'Blogs', from: 'Posts', foreign: '_id', local: 'blogId'},
{to: 'Users', from: 'Blogs', foreign: '_id', local: 'ownerId'},
{to: 'Users', from: 'Posts', foreign: '_id', local: 'authorId'}
];

Easy, right? Next, fill our DB with some fixtures:

const blogs = [
{_id: 0, ownerId: 3, topic: 'Best blog!'}
];
const posts = [
{_id: 0, authorId: 0, blogId: 0, title: 'Being fast for dummies'},
{_id: 1, authorId: 0, blogId: 0, title: 'Being declarative'},
{_id: 2, authorId: 1, blogId: 0, title: 'Amazing!'},
{_id: 3, authorId: 2, blogId: 0, title: 'A sparrow, really?'}
];
const users = [
{_id: 0, name: 'SparrowQL', about: 'Declarative aggregations'},
{_id: 1, name: 'Random #1', about: 'Node.js developer'},
{_id: 2, name: 'Random #2', about: 'Bird lover'},
{_id: 3, name: 'BlogOwner', about: 'Owner of the blog'}
];
await Blogs.insertMany(blogs);
await Posts.insertMany(posts);
await Users.insertMany(users);

Query time! Run a simple query:

const {build} = require('sparrowql');const projection = {
blogTopic: 'Blogs.topic',
postTitle: 'Posts.title'
};
const start = 'Posts';
const pipeline = build({projection, relations, start});
const results = Posts.aggregate(pipeline).toArray();
// There is also the _id field, but I'll skip it everywhere.
[ { blogTopic: 'Best blog!', postTitle: 'Being fast for dummies' },
{ blogTopic: 'Best blog!', postTitle: 'Being declarative' },
{ blogTopic: 'Best blog!', postTitle: 'Amazing!' },
{ blogTopic: 'Best blog!', postTitle: 'A sparrow, really?' } ]

Whoa! What happened? Here’s the pipeline:

[ { $lookup:
{ as: 'SPARROW_JOINED_Blogs',
foreignField: '_id',
from: 'Posts',
localField: 'blogId' } },
{ $unwind:
{ path: '$SPARROW_JOINED_Blogs',
preserveNullAndEmptyArrays: true } },
{ $project:
{ blogTopic: '$SPARROW_JOINED_Blogs.topic',
postTitle: '$title' } } ]

Okay, that’s cool! But what if we want to get the post blog owner name? Using Users.name won’t work, as it’ll resolve with post author name. In this case, the relations graph forms a DAG (directed acyclic graph), yet SparrowQL will work correctly only on directed trees. In case you are not familiar with these terms, or simply want to know what to do now, here it is: aliases. Let’s define them, along with new relations:

const aliases = {
Authors: 'Users',
Owners: 'Users'
};
const relations = [
{to: 'Blogs', from: 'Posts', foreign: '_id', local: 'blogId'},
{to: 'Owners', from: 'Blogs', foreign: '_id', local: 'ownerId'},
{to: 'Authors', from: 'Posts', foreign: '_id', local: 'authorId'}
];
Remodeled database graph

It’s a directed tree! Then, we can finally implement our query:

const projection = {
ownerName: 'Owners.name',
postTitle: 'Posts.title'
};
const start = 'Posts';
const pipeline = build({aliases, projection, relations, start});
const results = Posts.aggregate(pipeline).toArray();
[ { ownerName: 'BlogOwner', postTitle: 'Being fast for dummies' },
{ ownerName: 'BlogOwner', postTitle: 'Being declarative' },
{ ownerName: 'BlogOwner', postTitle: 'Amazing!' },
{ ownerName: 'BlogOwner', postTitle: 'A sparrow, really?' } ]

And here’s a pipeline:

[ { $lookup:
{ as: 'SPARROW_JOINED_Blogs',
foreignField: '_id',
from: 'Blogs',
localField: 'blogId' } },
{ $unwind:
{ path: '$SPARROW_JOINED_Blogs',
preserveNullAndEmptyArrays: true } },
{ $lookup:
{ as: 'SPARROW_JOINED_Owners',
foreignField: '_id',
from: 'Users',
localField: 'SPARROW_JOINED_Blogs.ownerId' } },
{ $unwind:
{ path: '$SPARROW_JOINED_Owners',
preserveNullAndEmptyArrays: true } },
{ $project:
{ ownerName: '$SPARROW_JOINED_Owners.name',
postTitle: '$title' } } ]

As you can see, SparrowQL has done a good job, and prepared a nice pipeline to fulfill our needs. But what if we don’t need any special logic? Let’s check it with a simple query:

const projection = {
postTitle: 'Posts.title'
};
const start = 'Posts';
const pipeline = build({aliases, projection, relations, start});
const results = Posts.aggregate(pipeline).toArray();
[ { postTitle: 'Being fast for dummies' },
{ postTitle: 'Being declarative' },
{ postTitle: 'Amazing!' },
{ postTitle: 'A sparrow, really?' } ]

And a simple pipeline was generated:

[ { $project: { postTitle: '$title' } } ]

That’s not everything! SparrowQL can do a few more things:

  • handle limit and skip, which maps to $limit and $skip directly,
  • handle query, which maps to $match directly (*),
  • handle sort, which maps to $sort directly (*).

(*) They may result in more than one phase.

All of them are easy enough if you understood at least a part of the above. There’s also `computed`, which is an advanced feature, designed to handle a $group phase.

Let’s add something to our DB and run a complex query, with limit, skip, query and sort:

await Posts.insertMany([
{_id: 4, authorId: 1, blogId: 0, title: 'Best!'},
{_id: 5, authorId: 1, blogId: 0, title: 'Superb!'}
]);
const limit = 1;
const projection = {
blogOwnerName: 'Owners.name',
postAuthorName: 'Authors.name',
postTitle: 'Posts.title'
};
const query = {'Authors.name': 'Random #1'};
const skip = 1;
const sort = {'Posts.title': -1};
const start = 'Posts';
const pipeline = build({
aliases,
limit,
projection,
query,
relations,
skip,
sort,
start
});
const results = Posts.aggregate(pipeline).toArray();
[ { blogOwnerName: 'BlogOwner',
postAuthorName: 'Random #1',
postTitle: 'Best!' } ]

Which generates a complex pipeline:

[ { $sort: { title: -1 } },
{ $lookup:
{ as: 'SPARROW_JOINED_Authors',
foreignField: '_id',
from: 'Users',
localField: 'authorId' } },
{ $unwind:
{ path: '$SPARROW_JOINED_Authors',
preserveNullAndEmptyArrays: true } },
{ $match: { 'SPARROW_JOINED_Authors.name': 'Random #1' } },
{ $sort: { title: -1 } },
{ $skip: 1 },
{ $limit: 1 },
{ $lookup:
{ as: 'SPARROW_JOINED_Blogs',
foreignField: '_id',
from: 'Blogs',
localField: 'blogId' } },
{ $unwind:
{ path: '$SPARROW_JOINED_Blogs',
preserveNullAndEmptyArrays: true } },
{ $lookup:
{ as: 'SPARROW_JOINED_Owners',
foreignField: '_id',
from: 'Users',
localField: 'SPARROW_JOINED_Blogs.ownerId' } },
{ $unwind:
{ path: '$SPARROW_JOINED_Owners',
preserveNullAndEmptyArrays: true } },
{ $project:
{ blogOwnerName: '$SPARROW_JOINED_Owners.name',
postAuthorName: '$SPARROW_JOINED_Authors.name',
postTitle: '$title' } } ]

That’d be all for starters. Got hooked with the idea? If so, why not try out SparrowQL in your own project? It only takes minutes to get it up and running while potential profits are significant — less time spent on pipelining queries and boost on performance at once. SparrowQL — no reason to miss out on that little bird!

--

--