MongoDB Aggregation: Can We Join a Collection on A No-SQL Database?

For those who ever knew about the database, it is not strange to hear about no-SQL. MongoDB, as the famous no-SQL document-oriented database, has been massively used by the developer.

source: fool.com

MongoDB preserves a user-friendly database through Ad-hoc queries, aggregation, capped collection, file storage, indexing, load balancing, replication, and server-side JavaScript execution.

However, only a few of the users whose a clear understanding of MongoDB’s features. On several occasions, when I was reviewing a backend code that used MongoDB as the database, I found several dirty codes. The common mistake of the MongoDB developer is manually joining the collection in the server program.

Aggregation

Aggregation is a hidden gem feature that is preserved by MongoDB. Aggregation serves as a manipulation query like in MongoDB collection as in SQL database.

Aggregation process a large number of data in a collection by passing them through different stages(pipeline). Here, I will explain to you several aggregation operations that you can use on your program.

The Pipeline

MongoDB aggregation works in the form of a data operation pipeline. A pipeline consists of one or more stages of operation that process the document. Each phase could be a filter, join, or unwind operation.

Thus, the pipeline is an array of operations that allows developers to combine any other operations in MongoDB so the developers can get the desired result. You will learn the kind of the operation later.

[
// Stage 1: Filter pizza order documents by pizza size
{
$match: { size: "medium" }
},
// Stage 2: Group remaining documents by pizza name and calculate total quantity
{
$group: { _id: "$name", totalQuantity: { $sum: "$quantity" } }
}
]

Look Up

Joining the table plays a significant role in database optimization. Lookup operation preserves the left outer join to a collection in the same database.

{
$lookup:
{
from: <collection to join>,
localField: <field from the input documents>,
foreignField: <field from the documents of the "from" collection>,
as: <output array field>
}
}

The operation is equal to the query below.

SELECT *, <output array field>
FROM collection
WHERE <output array field> IN (
SELECT *
FROM <collection to join>
WHERE <foreignField> = <collection.localField>
);

If you are willing to do a nested query or subqueries on your operation you can use a pipeline such as below. You can fill the pipeline with another query (operation).

{
$lookup:
{
from: <joined collection>,
let: { <var_1>: <expression>, …, <var_n>: <expression> },
pipeline: [ <pipeline to run on joined collection> ],
as: <output array field>
}
}
SELECT *, <output array field>
FROM collection
WHERE <output array field> IN (
SELECT <documents as determined from the pipeline>
FROM <collection to join>
WHERE <pipeline>
);

Match

One of the optimization keys in database queries is filtering the data. You could only choose the specified document that matches the condition through $match operation.

{ $match: { <query> } }

You can specify the filter using query expressions such as $gt (greater than) and any other expression.

{ $match: { $expr: { <aggregation expression> } } }

Deconstructing The Object Document

The problem with a document-oriented database is the form of return value that is strict on the document form itself. Using $unwind you can deconstructs an array field from a document object, and output it as an array of each element.

{ $unwind: <field path> }

Suppose you have an object below. An object that has an array as one of the object child values.

{ 
"_id" : 1,
"item" : "Shirt",
"sizes": [ "S", "M", "L"]
}

If you want to deconstruct the child into an array of objects like below. You can use $unwind operation.

[
{ _id: 1, item: 'Shirt', sizes: 'S' },
{ _id: 1, item: 'Shirt', sizes: 'M' },
{ _id: 1, item: 'Shirt', sizes: 'L' }
]

using operation below.

 { $unwind: { path: "$sizes" } } 

Replacing The Root

As we join the collection, the document goes deep, as the tree of the data structure of an object. It makes us harder to get the value of the result if what we desire is only the leaf of the object. In order to handle this problem, MongoDB preserves the $replaceRoot operation.

$replaceRoot operation allows us to replace the root of the existing document with the desired output. You can promote the leaf or the object child of the document as the root value of the object through this operation. $replaceRoot has the following form.

{ $replaceRoot: { newRoot: <replacementDocument> } }

Suppose we have data below.

[
{ "_id" : 1, "name" : "Arlene", "age" : 34, "pets" : { "dogs" : 2, "cats" : 1 } }
{ "_id" : 2, "name" : "Sam", "age" : 41, "pets" : { "cats" : 1, "fish" : 3 } }
{ "_id" : 3, "name" : "Maria", "age" : 25 }
]

By implementing the following operation, we can get the following result below.

{ $replaceRoot: { newRoot: { $mergeObjects:  [ { dogs: 0, cats: 0, birds: 0, fish: 0 }, "$pets" ] }} }  
[
{ "dogs" : 2, "cats" : 1, "birds" : 0, "fish" : 0 }
{ "dogs" : 0, "cats" : 1, "birds" : 0, "fish" : 3 }
{ "dogs" : 0, "cats" : 0, "birds" : 0, "fish" : 0 }
]

We can also use it for nested document array, by combinint $replaceRoot with $unwind operation.

Grouping The Data

Somehow, we need to populate the document according to a specific key. If you want to do that operation, you can use $group operation. $group operation preserves the output of the document that has been populated by the specific key.

Furthermore, you can also use statistical operations on the document that has been populated before. As in SQL, may you have familiar with GROUP BY query. Here is the syntax of the operation.

{
$group:
{
_id: <expression>, // Group key
<field1>: { <accumulator1> : <expression1> },
...
}
}

In the triple dots, you can input statistical operation as mentioned before. The form of the operation could be count, sum, and any other operation that you can see in the documentation page.

Suppose you have data below.

[
{ "_id" : 1, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("2"), "date" : ISODate("2014-03-01T08:00:00Z") },
{ "_id" : 2, "item" : "jkl", "price" : NumberDecimal("20"), "quantity" : NumberInt("1"), "date" : ISODate("2014-03-01T09:00:00Z") },
{ "_id" : 3, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt( "10"), "date" : ISODate("2014-03-15T09:00:00Z") },
{ "_id" : 4, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt("20") , "date" : ISODate("2014-04-04T11:21:39.736Z") },
{ "_id" : 5, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("10") , "date" : ISODate("2014-04-04T21:23:13.331Z") },
{ "_id" : 6, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("5" ) , "date" : ISODate("2015-06-04T05:08:13Z") },
{ "_id" : 7, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("10") , "date" : ISODate("2015-09-10T08:43:00Z") },
{ "_id" : 8, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("5" ) , "date" : ISODate("2016-02-06T20:20:13Z") },
]

If you apply following operation, you can get the result as mentioned below.

{
$group: {
_id: null,
count: { $count: { } }
}
}
{ "_id" : null, "count" : 8 }

Sort The Return Value

It is more optimum if you can sort the array before sending it to the client. But, will be difficult, if you sort it manually on your server. You can optimize the sort operation, by asking the MongoDB to sort on the structure that they have through $sort operation. This operation is equal to order by query in SQL.

Here is the syntax of the $sort operation. You can fill the value of sort order object by -1 for descending order and 1 for ascending order. The limit of the operation is at maximum of 32 order keys.

{ $sort: { <field1>: <sort order>, <field2>: <sort order> ... } }

At The End

It is important to have a clear understanding of the stack that is used. Here is only a short introduction to aggregation operation in MongoDB. For more specific information, you can read the documentation here. You can use aggregation section in MongoDB DBMS to visualize the aggregation pipeline. The feature is located beside the collection of the data.

If you curious about one of pipeline that i made, here is the example.

[
{
$match: {
_id: ObjectId("6401aec4c409219093cb18b7"),
},
},
{
$lookup: {
from: "group",
localField: "group_list",
foreignField: "_id",
pipeline: [
{
$lookup: {
from: "class",
localField: "classes",
foreignField: "_id",
pipeline: [
{
$lookup: {
from: "task",
localField: "active_task_list",
foreignField: "_id",
pipeline: [
{
$lookup: {
from: "tags",
localField: "tags",
foreignField: "_id",
as: "tags",
},
},
{
$match:{
date_due: {
$gt: Date()
}
}
}
],
as: "task",
},
},
],
as: "classes",
},
},
],
as: "group_list",
},
},
{
$unwind: {
path: "$group_list",
includeArrayIndex: "string",
preserveNullAndEmptyArrays: false,
},
},
{
$unwind: {
path: "$group_list.classes",
includeArrayIndex: "string",
preserveNullAndEmptyArrays: false,
},
},
{
$unwind: {
path: "$group_list.classes.task",
includeArrayIndex: "string",
preserveNullAndEmptyArrays: false,
},
},
{
$replaceRoot:
{
newRoot: {
$mergeObjects:[
{name: "$group_list.classes.name"},
"$group_list.classes.task"
],
},
},
},
]

This query is made for getting active task of a user that joined into a group. Each group have a class, and each class is storing the task. I also tried to get the tag object for each task.

--

--