Snowflake Window Functions — A guide — part 1

Photo by R Mo on Unsplash

What are window functions?

Window functions are a group of analytical functions which work on a subset of row groups within your dataset or as the name suggests, they work on a window of your data. Snowflake’s window functions can be divided into two categories,

  1. Rank Related Window Functions: These functions allow you to assign a rank/index to a row within your window
  2. Window Frame Functions: These category of window functions allow you to perform rolling calculations

Where might we need them?

The data landscape is continuously evolving; the more data we produce, the more there is a need to consolidate the data smartly. These functions are important tools to have in your toolkit to extract insights from your data.

Think of your dataset having data around orders from customers, and for each customer you want to have a rolling sum of order value, order number, and rolling average of order value. How would you go about it? Pretty challenging, no? That is exactly where these functions come in handy. One can perform such calculations without having to write too much of an extra complex code. In the following sections, let’s look into how we can apply these functions.

For the tutorial, we are using Snowflake’s TPC-H sample data (read more). You can create a free Snowflake trial account here: https://signup.snowflake.com/ and this dataset will be available to you by default.

We will be using ORDERS table and as we can see in the screenshot above, we have an order-id, customer-id, order date, and total price which are our columns of interest for now.

What are some of the most common window functions?

While there is a long list of functions that Snowflake supports, we will cover a handful of the most commonly used window functions and their uses.

Row number

Row number is one of the most commonly used window functions, defined as

ROW_NUMBER() OVER (
[ PARTITION BY <expr1> [, <expr2> ... ] ]
ORDER BY <expr3> [ , <expr4> ... ] [ { ASC | DESC } ]
)

An important thing to highlight here is the different parameters of these functions,

1. Function name: ROW_NUMBER() — some functions also accept optional parameters

2. Partition clause: PARTITION BY <expr> — here you specify the dimension(s) you want to use to partition your data

3. Ordering clause: ORDER BY <expr> — here you specify the column names which you want to use to order the partition by

Notes:

- There are more optional parameters which we will cover in the following posts.

- Each partition is ordered independently

We can use ROW_NUMBER() function whenever we have to define an order among a subset of rows i.e. a window. It is classified as “Rank-related” window functions. The counter starts with 1 and increases sequentially.

Lets say we have to see customer’s order number using the data above, we would define the function this way,

ROW_NUMBER() OVER (PARTITION BY O_CUSTKEY ORDER BY O_ORDERDATE ASC)

Partition by clause would help us create a new window per customer-id and order by clause would help us define the order of rows within those partitions, which is important for rank-related window functions. Here row_number() = 1 would mean the very first order of the customer, just reversing the order by clause we can also get the opposite results, the last order ranked as 1.

This is how our code looks now (note that we filtered for one customer-id to make it easier to see the action)

The last column RN is starting from 1 and incrementing by one till all orders of this customer are ranked.

We will look at more examples in part 2, stay tuned!

If this helped you improve your knowledge of window functions, hit the like button, share it among your peers if they might benefit from it, and if you can think of some creative ways of applying the functions discussed in this post, leave a comment!

--

--