Getting started with MongoDB 5.0 window functions

Published in

MongoDB Performance Tuning

3 min readAug 17, 2021

Window or “analytic” functions are one of the most powerful, yet hard to learn, parts of the SQL language.

Window functions partition the rows in a result set and create a sort of “virtual table” that the function works with. The function operates on a “window” of rows around the current row, allowing you to access trends or group information.

For instance, this query lists the top ten rides in terms of revenue, with the percentage of the total revenue and city revenue displayed:

Without windowing functions, these sorts of queries require expensive self-joins at best or need to be implemented outside of SQL at worst.

MongoDB has now introduced the logical equivalent of SQL Windowing functions in 5.0. Like SQL Windowing functions there is a bit of a learning clause, but they are VERY powerful. Let’s take them for a drive!

Sample data

As I write this, Australia is in the midst of a new wave of Covid infections and most of us here are locked down. So of course, I want to know whether the infections are increasing and where. So I’m going to create moving averages for each State of new cases detected.

My raw data looks like this:

For each State in Australia, we have the total number of cases ever recorded.

The data is sourced from https://raw.githubusercontent.com/covid-19-au/covid-19-au.github.io/dev/src/data/state.json.

Calculating daily cases

How can we do a moving average over the last 7 days? The first thing we need to do is to generate the difference between today's total and yesterday's total. We couldn't do this in the aggregation framework without windowing functions, but now we can. Here’s the Window function:

The partitionBy clause of $setWindowFields indicates that we should perform these window functions independently for each State. We are generating a delta for each state, not for Australia as a whole.

$sortBy tells us how to sort the data internally within each partition. In this case, we are sorting the data by observation date.

output includes the data to be returned. In this case, we create a yesterday attribute. Yesterday is defined as the value totalCases for the window with the range [-1,-1] — eg, yesterday. output emits an array of values (because we might have generated more than one value). Therefore, we $unwind the data in the next step.

So now we have a result set that contains today's total and yesterday’s total. The next obvious step is to calculate the difference. The final $addFields calculates the difference between today and yesterday and puts it in the newCases attribute.

At this point our pipeline contains the newCases for each day for each state:

Creating the moving average

A moving average represents the average of the last N records. So a 7 day moving average represents the average number of new cases for each day in the last week. To create the moving average we use another window function step:

This is similar to the last window function — we again partitionBy state and sortBy the observation date. But this time, we output the $avg value of newCases over a 7 day window- range[-6,0].

Adding a few extra steps to clean up the output we see the moving average cases for NSW is still on the increase:

Conclusion

The increase in COVID cases is bad news for Australia — forcing us to stay locked down and writing blogs when we’d rather be outside socializing. But Window functions are good news for MongoDB users! We can now do things in Aggregation functions that previously would have required moving the data out of MongoDB and manipulating it in program code.

In the next installment, I’ll look at how Window functions perform at scale, and how they tie into time-series collections — the other “big” feature of MongoDB 5.0.

Guy Harrison is CTO at ProvenDB.com and author of MongoDB Performance Tuning as well as other books. He writes a monthly column for Database Trends and Applications. Find him on the internet at guyharrison.net or @guyharrison on twitter.

Getting started with MongoDB 5.0 window functions

Sample data

Calculating daily cases

Creating the moving average

Conclusion

Written by Guy Harrison