Mysql window functions

How to use window functions in Mysql

Denys Golotiuk
DataDenys
Published in
6 min readNov 11, 2022

--

Using windows in Mysql is a way to query data based on a set of rows instead of a single row. This helps in comparative analysis, when we do not only calculate values based on groups but also based on preceding or following rows in a group. Popular examples here are moving average and cumulative sum. Let’s take a look on practical cases…

Test table and data

Assume we have the following table:

CREATE TABLE `test` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`amount` float DEFAULT NULL,
`user_id` int unsigned DEFAULT NULL,
`product_id` int unsigned DEFAULT NULL,
`at` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
)

…with some test data:

Here we store product transactions, generated by users with some amount (of money) happened at a specific time.

How windows work in Mysql

The window is simply some set of rows (defined by specific rules), for example:

Here we have 2 randomly selected set of 3 rows. Since this selection doesn’t make any sense let’s select one window for each product:

Now we have 3 windows, each includes data for only one specific product. Basically, we have just grouped our data based on product_id:

SELECT product_id, sum(amount) FROM test GROUP by product_id LIMIT 5

That is an aggregation as we know it. In contrast, window function will calculate [something we’re asking for] for each row in a group (window) instead of a group:

SELECT product_id,
sum(amount) over (partition by product_id)
FROM test LIMIT 5;

This will give us the following:

So, we do not actually group our data on product_id column. Instead we ask Mysql to add sum(amount) value into each table row, calculated for every product_id group. WTF is this used for and why we need it? Let’s discover based on several cases. But before that, let’s understand how windowing works from the perspective of coding.

Window syntax

From the example above, we’ve asked Mysql to calculate sum for each product_id:

Or, in other words, we’re adding sum for defined windows to each row in our original table:

Now let’s find out practical cases to use that for.

Value fraction

Popular scenario is to calculate current row value fraction in window total. In our case, we can understand each row amount fraction per each product total (amount sum for each product):

SELECT product_id,
amount, 100*amount/sum(amount) over (partition by product_id) as frac
FROM test LIMIT 10

And we get the following:

So for each row we can see its amount significance in relation to total amount for corresponding product. Following this example, we can further improve our query and order it by fraction value:

SELECT product_id,
amount, 100*amount/sum(amount) over (partition by product_id) as frac
FROM test ORDER BY frac DESC LIMIT 10

And we’re then able to get a list of rows where amount has the biggest fraction from the product total:

Value fraction from table total

We can skip OVER() expression (just leave it empty) and find each row amount fraction from table total:

SELECT product_id,
amount, 100*amount/sum(amount) over () as frac
FROM test LIMIT 10

In this case, we basically define a single window which is our entire table. So sum(amount) will return total for all rows in a table:

More aggregate functions

We can use any aggregate function with windows:

SELECT id, product_id, amount,
avg(amount) over (partition by product_id),
min(amount) over (partition by product_id)
FROM test LIMIT 10;

Which will give us averages and minimums for each window:

Running total

Another popular example is calculating cumulative totals as we go row by row. When we want to apply running aggregations, we need to define ORDER expression for OVER() clause:

Let’s take a look at this example:

SELECT at, product_id, amount,
sum(amount) over (partition by product_id order by id) as run_sum
FROM test LIMIT 10

This will calculate amount column cumulative sum for each product:

In other words, we got data for cummulative chart in run_sum column:

Moving average

When our data is time-based, we might want to calculate aggregations for a set of rows that includes current and multiple previous (or following) rows. This is what called moving aggregations and can be done by specifying ROWS expression for OVER() clause:

Let’s calculate moving average for amount column based on a window of 3 preceding rows and zero following:

SELECT at, id, amount,
avg(amount) over (
order by id
rows between 3 preceding and 0 following
) as run_avg
FROM test LIMIT 10;

Which gives the following:

Top row from group

Let’s say we would like to understand which user (user_id) has bought our product (product_id) at the biggest price (amount). This can be done using rank() window function. First, let’s see what rank function gives us:

SELECT product_id, user_id, amount,
RANK() OVER(partition by product_id order by amount desc)
FROM test LIMIT 15

This will sort our window and assign rank (sorted position number) to each row based on a given order by expression in OVER() clause:

The last query will give us the following:

Now we can filter only those rows with rank = 1 because we want to see top row for each product:

WITH ranked AS(
SELECCT product_id, user_id, amount,
RANK() OVER(partition by product_id order by amount desc) as rnk
FROM test
)
SELECT product_id, user_id, amount
FROM ranked
WHERE rnk = 1;

Which gives us user_id with biggest amount value for each product — exactly what we need:

Summary

Window functions allow analyzing datasets based on groups of rows instead of single rows. In cotrast to grouping aggregations, windows are used in the context of each row (instead of context of grouped values). Popular examples of window functions are moving averages and running totals, but window-based analysis goes far beyond those.

--

--