Snowflake Window Functions — A guide — part 3

Photo by R Mo on Unsplash

Welcome back! 👋

After looking at rank-related window functions in part 1 and part 2 of this series, we will look into the other type of window functions in this chapter i.e. window-frame functions. If you haven’t checked those articles yet and are new to window functions, it might be a good idea to go through them. They will be a good entry point for understanding window functions and where we might need them.

Window Frame Functions

So far, we have looked at rank-related functions primarily used to assign ranks to the data or fetch a column value at a particular occurrence. What if we want to take them one step further and we were able to do a calculation while processing these windows? That’d open up many new opportunities to see interesting trends in data.

Let’s say we want to calculate a moving average without window functions, how would you go about it?
One way to tackle that would be to do a self-join of all previous records with the current record and compute the average. What if you want to do a rolling average for the last 7 days? You will have to introduce some more logic and the processing suboptimal because of duplication of the data. It is certainly doable but not as clean of a solution as we would like and as they say on Shark Tank, “THERE HAS TO BE A BETTER WAY”. Well, window functions are the better way and it turns out that most of our beloved aggregation functions can be extended into window functions. Let’s look at some examples now.

Average / Rolling Average (AVG)

To answer questions about moving averages, the average aggregation function (AVG) can be extended into a window function simply by specifying a window after the function. The syntax would look as follows,

AVG( <expr1> ) OVER (
[ PARTITION BY <expr2> ]
[ ORDER BY <expr3> [ ASC | DESC ] ]
)

Look at the query below, we simply add a partition clause with our AVG()aggregation function,

The results look like this,

Notice the highlighted area and the averages. Look at how Snowflake only takes the average up until the row and not for every row in the defined window. This is because we have explicitly defined order within the window. What do you think might happen if we remove the ORDER BYclause?

Answer at the end of the article

Now that we have seen how to calculate a moving average. Let’s look at how to calculate the rolling sum for use cases like how much has a particular store generated in sales over time.

SUM

Just like AVG() this function behaves exactly the same and follows the same syntax,

SUM( <expr1> ) OVER (
[ PARTITION BY <expr2> ]
[ ORDER BY <expr3> [ ASC | DESC ] ]
)

The same for SUM() as it was for AVG() the rows are aggregated up until the current row if the order is specified in the window. What do you think will happen with SUM() if the ORDER BY clause is removed?

In answer to the questions above,

If you remove the ORDER BY clause from window frame functions they will iterate over the whole window to compute that aggregation. In our first example with AVG() I asked what will happen if we remove the ORDER BY clause, well it will result in all rows having the same value for the average.

Look at the last column, that is Snowflake’s default behavior and one must be mindful of when to put an ORDER by clause and when one shouldn’t.

What other aggregation functions can we use as window frame functions?

We can also use MIN(), MAX(), COUNT(), COUNT_IF(), and a few more statistical functions similar to how we used SUM() and AVG(). You can find the complete list here. I won’t go into details about all of them but will only show an example of where all these functions are being used.

One very interesting observation about these results is to see the MIN() and the MAX() functions in action both with and without the ORDER BY clause. What do you think is the most logical usage of these two particular functions here?

And more generally, what do these numbers tell us about our customers and data? Can we understand our customers better using these functions?

Can we unlock more value with Window Functions?

Just by looking at the simplicity of the code and the scalability of these functions I’d always prefer them over maintaining complex code with self-joins. They might be a little tricky to grasp at the beginning, especially when working with irregular-sized windows and complex datasets but once you understand the basics, they are pretty useful. Think of them as different SQL functions stitched together in a nice way to make life easier for the developers. Ask yourself this, if you needed to perform a similar analysis as we did in the example above but without any window functions, how long would that take you? Also, how much work it would be to maintain such complex code? Maybe you prefer to write queries with a lot of subqueries and self-joins but give these a try and leave a comment about your findings.

Now that we have covered most of the commonly used window functions, there’s only one question left to answer, what if we want to calculate 7-day moving average and 30-day moving average and also be able to see it in one query, can we do it? The answer is, “YES! WE CAN!” but keeping in view the length of this article already, I will write another short article about how to do that and conclude this series with that.

If this helped you improve your knowledge of window functions, hit the like button, share it among your peers if they might benefit from it, and if you can think of some creative ways of applying the functions discussed in this post, leave a comment!

--

--