Snowflake Window Functions — A guide — part 2

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

5 min readSep 19, 2022

Welcome back! 👋

It’s time for us to dive into part 2 of the guide. If you haven’t read the first part yet, you can find it here. In the first part, we talked about what window functions are, how they are defined in Snowflake and a walkthrough of ROW_NUMBER() window function.

In this part, we will dive into more window functions and look into examples of how and where to use them. We will be using the same dataset as before.

Rank & Dense Rank

RANK() and DENSE_RANK() functions are very similar to the ROW_NUMBER()function which we looked at in the first part. As the name suggests, they are used to rank the rows in your data. Here is how they are defined

RANK() OVER ( 
    [ PARTITION BY <expr1> ]
    ORDER BY <expr2> [ { ASC | DESC } ] 
)DENSE_RANK() OVER ( 
    [ PARTITION BY <expr1> ] 
    ORDER BY <expr2> [ ASC | DESC ] 
)

The main difference between both is how the data is ranked. Both functions rank your data in accordance with the partition and order by clause and assign the same rank to the rows where values are the same. The difference comes after the rows with the same rank, RANK() the function would skip the subsequent ranks and DENSE_RANK() would not skip and it will continue from the next rank.

Example: To make it easier to understand, we will use not use our example data set but create a very simple example. Imagine, if person A has $1, person B & person C has $2, and person D has $3 and you want to rank them according to how much money they have, you will get the following ranks using both functions,

— and this is how the query looks like

Look closely at how the last two columns for person D are different where RANK() skipped a rank while DENSE_RANK() didn’t skip any rank after ranking rows 2 & 3 the same.

Lag & Lead

LAG() and LEAD() functions are also rank-related window functions and are used to get the value of a column in the preceding or following rows. They are particularly useful when you want to do an analysis of how events change over time. Here is how they’re defined,

LAG( <expr> [ , <offset> ] ) OVER ( 
    [ PARTITION BY <expr1> ] 
    ORDER BY <expr2> [ { ASC | DESC } ] 
)LEAD( <expr> [ , <offset> ] ) OVER ( 
    [ PARTITION BY <expr1> ] 
    ORDER BY <expr2> [ { ASC | DESC } ] 
)

As an example, thinking about our original dataset, what if we want to compare the customer’s current order value with the last order value and also the next order value (if any)? This is how we would write the query:

— and the results would be:

Notice that the PREVIOUS_ORDER_VALUE column for the first row is NULL and similarly the NEXT_ORDER_VALUE column for the last row would be NULL also.

Challenge: Could you try and replicate the query results abovebut using only LAG() function?
Leave a comment below to let me know how it goes

Additional Notes

LAG() and LEAD() functions have additional offset parameters which we can use to specify how far back or further in the future we should look while looking for values. You can specify to look for 2 preceding or leading events like this,

We just specified the offset with the function name: LEAD(O_TOTALPRICE, 2) in this case to get the value of O_TOTALPRICE in a row 2 intervals away.

First Value & Last Value

FIRST_VALUE() and LAST_VALUE() similar to LAG() and LEAD() functions give us the value of a particular column but as the name suggests they either give you the values at the beginning or the end of that data subset. Here is how they are defined:

FIRST_VALUE( <expr> ) OVER ( 
    [ PARTITION BY <expr1> ] 
    ORDER BY <expr2>  [ { ASC | DESC } ]
)LAST_VALUE( <expr> ) OVER ( 
    [ PARTITION BY <expr1> ] 
    ORDER BY <expr2>  [ { ASC | DESC } ]
)

Continuing our example from above, what if with the previous and the following order value we also want to know the value of the customer’s first and the last order, here is how we would go about it:

This time we chose a customer with 10 orders to see all these functions in the action together.

Challenge: Could you try and replicate the query results above but using only LAG() and LAST_VALUE() function?
Leave a comment below to let me know how it goes

These were all the rank-related window functions on Snowflake that we would cover in this guide. In the following parts, we would cover window-frame functions on Snowflake.

Could you think of some use cases where these functions would come in handy? Also, think about how would you go about these problems if you did not know about these functions.

If this helped you improve your knowledge of window functions, hit the like button, share it among your peers if they might benefit from it, and if you can think of some creative ways of applying the functions discussed in this post, leave a comment!

Snowflake Window Functions — A guide — part 2

Rank & Dense Rank

Lag & Lead

First Value & Last Value

Written by Muhammad Fasih Ullah