In day-to-day work, it is common to encounter the need to rank within each group, this is when we need to use window function
Window functions, also called OLAP functions (Online Analytical Processing), can perform real-time analysis and processing on database data.
over (partition by <column name used for grouping>
order by <column name for sorting>)
So it was just like first group by a column, then order by another column.
Below case it’s group by column two班级
(Orange line), and in each of group order by score成绩
.
select *,
rank() over (partition by 班级
order by 成绩 desc) as ranking
from 班级表
It can be used for:
1) Dedicated window functions: rank, dense_rank, row_number and other
2) Aggregation functions: sum, avg, count, max, min, etc.
Can also be used as a reference of overall average, if we put nothing in over() function.
Can partition data by 1 or more columns
Why we use window function if there’s already existed GROUP BY function?
This is because group by changes the number of rows in the table after grouping, and a row has only one category. The partition by and rank functions do not reduce the number of rows in the original table. For example, the following counts the number of people in each class.
select *,
rank() over (order by 成绩 desc) as ranking,
dense_rank() over (order by 成绩 desc) as dese_rank,
row_number() over (order by 成绩 desc) as row_num
from 班级表
Sliding window keywords
- UNBOUNDED: start or the end of partition
- PRECEDING: data before
- FOLLOWING: data after
- UNBOUNDED PRECEDING means that the window starts from the first column of the partition; offset PRECEDING means that the window starts from the number of columns equal to the offset value before the current column. UNBOUNDED PRECEDING is the default value.
- CURRENT ROW indicates that the window starts or ends in the current row.
- UNBOUNDED FOLLOWING means that the window ends at the last column of the partition; offset FOLLOWING means that the window ends at the number of columns equal to the offset value after the current column.
Sales and sales_1 result are different, one is the current row and another is from start to current row.
Case study 1:
You will be creating a data set of games played by Legia Warszawa (Warsaw League), the top ranked team in Poland, and comparing their individual game performance to the overall average for that season.
In this case, use CASE WHEN to categorize they are in home or away, and use window function to group by season.
If you like my content please clapped for me and follow me, thank you :)
There’ll be more article and more content related to Data Science. Hope you enjoy it!