MongoDB Windows functions and time-series performance

Guy Harrison
MongoDB Performance Tuning
5 min readFeb 14, 2022

--

In this blog post, I worked out a simple example of using the new MongoDB windowing aggregation capabilities to produce moving averages. In this blog post, I’m going to look at the performance of these new stages.

The examples use our explain helper functions that we introduced in the book MongoDB Performance Tuning. You can get the functions here.

For these simple tests, we created some IOT-style data. Each document contains a timestamp, a deviceId and a measurement:

{
"timeStamp": {
"$date": "2016-09-28T07:08:04.396Z"
},
"deviceData": 7,
"_id": {
"$oid": "61516dd8ddca03c1c48f8b8c"
},
"measureMent": 0.06542439334828831
}

We are generating moving averages for each device ID, using a $setWindowFields command that looks like this:

Simple collection performance

Out execution plan in 5.0 looks like this:

1  $CURSOR ( ms 14709 returned:2628000)
2 COLLSCAN ( ms:53 docsExamined:2628000 nReturned:2628000)
3 SORT ( ms:5375 nReturned:2628000)
4 $_INTERNALSETWINDOWFIELDS ( ms 21059 returned:2628000)
Totals: ms: 40478 keys: 0 Docs: 26280001

Pretty straightforward: we scan the collection, sort the data and pass the sorted data through to the window function — represented in the execution plan by $_INTERNALSETWINDOWFIELDS.

We can reduce the sort time with a covering index. An index on { deviceData: 1, timeStamp: 1, measureMent: 1 } retrieves the data required in order without a blocking sort. The execution plan now looks like this:

dbKoda Mongo Shell>mongoTuning.aggregationExecutionStats(exIdxNoTs);1  $CURSOR ( ms 8166 returned:2628000)
2 IXSCAN ( iotts_ix ms:84 keys:2628000 nReturned:2628000)
3 FETCH ( ms:210 docsExamined:2628000 nReturned:2628000)
4 $_INTERNALSETWINDOWFIELDS ( ms 13413 returned:2628000)
Totals: ms: 23569 keys: 2628000 Docs: 2628000

Windows functions with time-series collections

MongoDB introduced windowing functions together with time-series collections, so it’s natural to expect the two to work together very effectively. Unfortunately, it’s not that simple.

Let’s say we have the same data in a time-series collection:

db.createCollection('iotTimeSeries', {
timeseries: {
timeField: 'timeStamp',
metaField: 'deviceData',
granularity: 'seconds'
}
});

If we perform the same moving average calculation on this data, our execution plan looks like this:

dbKoda Mongo Shell>mongoTuning.aggregationExecutionStats(exIdxTs);1  $CURSOR ( ms 1645 returned:409219)
2 COLLSCAN ( ms:14 docsExamined:409219 nReturned:409219)
3 $_INTERNALUNPACKBUCKET ( ms 1441 returned:2628000)
4 $SORT ( ms 13086 returned:2628000)
5 $_INTERNALSETWINDOWFIELDS ( ms 23753 returned:2628000)
Totals: ms: 43419 keys: 0 Docs: 409219

Wow. It’s quite a bit slower. We now have this extra phase $_INTERNALUNPACKBUCKETand the sort moves from within the $CURSOR stage to after the $_INTERNALUNPACKBUCKET stage.

Unpacking time-series buckets

So what is$_INTERNALUNPACKBUCKET? Well logically our data looks like this:

dbKoda Mongo Shell>db.getCollection("iotNoTimeSeries").findOne();
{
"_id" : ObjectId("61406123a0b7d2d02d607573"),
"timeStamp" : ISODate("2016-09-15T08:42:58.155Z"),
"deviceData" : 2,
"measureMent" : 0.5148227331516743
}

But under the hoods, it’s “bucketed” like this:

dbKoda Mongo Shell>db.getCollection("system.buckets.iotTimeSeries").findOne();
{
"_id" : ObjectId("57da5ed80fc4e549a491d13a"),
"control" : {
"version" : 1,
"min" : {
"_id" : ObjectId("61406099a0b7d2d02d104233"),
"timeStamp" : ISODate("2016-09-15T08:42:00Z"),
"measureMent" : 0.5148227331516743
},
"max" : {
"_id" : ObjectId("61406099a0b7d2d02d10426e"),
"timeStamp" : ISODate("2016-09-15T09:41:58.155Z"),
"measureMent" : 18.045503966452422
}
},
"meta" : 2,
"data" : {
"_id" : {
"0" : ObjectId("61406099a0b7d2d02d104233"),
"1" : ObjectId("61406099a0b7d2d02d104240"),
"2" : ObjectId("61406099a0b7d2d02d104244"),
"3" : ObjectId("61406099a0b7d2d02d104252"),
"4" : ObjectId("61406099a0b7d2d02d104256"),
"5" : ObjectId("61406099a0b7d2d02d104268"),
"6" : ObjectId("61406099a0b7d2d02d10426e")
},
"measureMent" : {
"0" : 0.5148227331516743,
"1" : 13.814766715198125,
"2" : 14.629821858124208,
"3" : 18.045503966452422,
"4" : 6.198123592373916,
"5" : 6.411943633059627,
"6" : 17.90065907622597
},
"timeStamp" : {
"0" : ISODate("2016-09-15T08:42:58.155Z"),
"1" : ISODate("2016-09-15T08:55:58.155Z"),
"2" : ISODate("2016-09-15T08:59:58.155Z"),
"3" : ISODate("2016-09-15T09:13:58.155Z"),
"4" : ISODate("2016-09-15T09:17:58.155Z"),
"5" : ISODate("2016-09-15T09:35:58.155Z"),
"6" : ISODate("2016-09-15T09:41:58.155Z")
}
}
}

So the $_INTERNALUNPACKBUCKET takes things out of the buckets, and then we must sort them to feed them into the $_INTERNALSETWINDOWFIELDS stage. We can’t create a covering index because of the internal structure of the buckets, and even if we could there’d still be a sort needed to get things in order for the windowing function.

We can at least reduce the amount of time it takes to pull data by increasing the bucket size. Here we reduce the time from 43,419 to 40,833ms by changing granularity to minutes instead of seconds:

dbKoda Mongo Shell>mongoTuning.aggregationExecutionStats(exIdxTsBG);1  $CURSOR ( ms 507 returned:20068)
2 COLLSCAN ( ms:0 docsExamined:20068 nReturned:20068)
3 $_INTERNALUNPACKBUCKET ( ms 1472 returned:2628000)
4 $SORT ( ms 11539 returned:2628000)
5 $_INTERNALSETWINDOWFIELDS ( ms 23974 returned:2628000)
Totals: ms: 40833 keys: 0 Docs: 20068

The sort spills over to disk. So if we allow for TONS of memory for the sort, we can bring the sort time back a bit as well:

db.getSiblingDB("admin").runCommand({ setParameter: 1, internalQueryMaxBlockingSortMemoryUsageBytes: 1048576*2000 });
{ "was" : 33554432, "ok" : 1 }

That's 2GB of sort memory! But it reduces sort time down from 11.5 seconds to 8.2 seconds:

1  $CURSOR ( ms 648 returned:20068)
2 COLLSCAN ( ms:4 docsExamined:20068 nReturned:20068)
3 $_INTERNALUNPACKBUCKET ( ms 2022 returned:2628000)
4 $SORT ( ms 5479 returned:2628000)
5 $_INTERNALSETWINDOWFIELDS ( ms 20957 returned:2628000)
Totals: ms: 32093 keys: 0 Docs: 20068

Unfortunately, you can’t adjust the sort size on Atlas, so this only works for an on-premise database.

Here’s a summary of the above results:

Conclusion

At the moment, windowing functions run a bit slower on time-series collections than in regular collections. That doesn’t mean that you shouldn’t use the two together, but if you simply want to generate moving averages and other complex analytics rising the $setWindowFields aggregation stage, there’s no need to use time-series collections.

If you are doing a $setWindowFields operation on a regular collection, consider a covering index to avoid the sort. A covering index won’t help on a time-series collection (indeed, you won’t be able to create one) but you could think about bigger bucket sizes and allocating more sort memory to improve performance.

Timeseries collections are in their first release, and the MongoDB team is looking at ways to improve performance, including removing the need for a non-blocking sort when you need to get data out in time-series order. And don’t forget, while time-series collections might not speed up analytic functions in all cases, they are very good at speeding up inserts, and will only get better.

Guy Harrison is CTO at ProvenDB.com and author of MongoDB Performance Tuning as well as other books. He writes a monthly column for Database Trends and Applications. Find him on the internet at guyharrison.net or @guyharrison on twitter.

--

--