Window functions for Linux users

Hey ! I really like plain SQL instead of Abstraction given by ORM technologies because of SQL power.Today I want to describe the concept of Window functions in Postgres (9.5). Let’s start.

We have two tables Department and Employee

CREATE TABLE dep
(
id integer NOT NULL DEFAULT nextval('dep_id_seq'::regclass),
name text,
CONSTRAINT dep_pkey PRIMARY KEY (id)
);
CREATE TABLE public.emp
(
id integer NOT NULL DEFAULT nextval('emp_id_seq'::regclass),
name text,
salary integer,
dep_id integer,
CONSTRAINT emp_pkey PRIMARY KEY (id),
CONSTRAINT emp_dep_id_fkey FOREIGN KEY (dep_id)
REFERENCES public.dep (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
);

Let’s insert some data

insert into dep (name) values ('IT'),('HR');

What do we need to do ? Select employees with highest salary in each department

In our table we have two departments IT and HR , Third employee has the highest salary in his own department whereas Six employee has the highest salary in HR department . Let’s write SQL statement

select e.name,d.name,e.salary from emp e
inner join dep d
on e.dep_id = d.id
where (e.dep_id,e.salary) in
(
select dep_id,max(salary) from emp
group by dep_id
)
Result 1

We join two tables and then use subquery to fetch max salaries for each department. But what if we want to select employee name and difference between max salary in department and employee salary.

select e.name ,e.salary ,top_salaries.max from emp e
inner join
(
select dep_id,max(salary) as max from emp
group by dep_id
)top_salaries
on top_salaries.dep_id = e.dep_id
Result 2

If requirements such as max ,comparing with min , top n-th will grow then our query would become more complex to maintain. For such situations Postgres gives us strong feature called Window functions

Let’s see the definition from official documentation

Window functions provide the ability to perform calculations across sets of rows that are related to the current query row. See Section 3.5 for an introduction to this feature.

Quite hard to understand so let’s see examples

Make note: window functions always use the OVER() clause so if you see OVER() you're looking at a window function.

Select name , salary and sum of all salaries from employee table

select name,salary,sum(salary) over() as total from emp ;
Result 3

Please notice that without window functions we can’t use aggregate functions with additional columns without group by clause. And now let’s see an SQL equivalent without window function

select e.name,e.salary,top.sum from emp e,
(
select sum(salary) as sum from emp
) top;

Obviously , first one is more readable.

Condition clauses

In the above example, we looked at a simple window function without any additional conditions, but in many cases, you’ll want to apply some conditions in the form of additional clauses to your OVER() clause. One is PARTITION BY which acts as the grouping mechanism for aggregations. The other one is ORDER BY which orders the results in the window frame.

  1. Partition by — allows us to group aggregations according to the values of the specified fields.

Do you remember the second query where we have selected employee with highest salary in him department. Now let’s rewrite it using Partition by

select salary,name,sum(salary) over(partition by dep_id) from emp;

We include column names(dep_id) to group result. I want you to mention that we used only aggregate functions , but Postgres provides a number of window specific functions which you can find here. One of the most popular use case of window functions is to rank each row using RANK() function .Let’s say we want to rank each employee’s salary according to department. We can easily do it using RANK function.

select e.salary,e.name,d.name ,rank() over() from emp e
inner join dep d
on e.dep_id = d.id;
Result 4

Now all employees have the same rank because ranking query requires an order by to be established,let’s fix it using ORDER BY.

select e.salary,e.name,d.name ,rank() over(order by e.salary) from emp e
inner join dep d
on e.dep_id = d.id;
Result 5

Now we have ranks for each employees but we need to divide them by department , it’s condition so we will use PARTITION BY

select e.salary,e.name,d.name ,rank() over(PARTITION BY e.dep_id order by e.salary) from emp e
inner join dep d
on e.dep_id = d.id;
Result 5

Now if we want to see top 2 salaries for IT department we can easily do it using limit and where clauses with query above

select e.salary,e.name,d.name ,rank() over(PARTITION BY e.dep_id order by e.salary) from emp e
inner join dep d
on e.dep_id = d.id
where e.dep_id = 1
limit 2;

Important thing about ORDER BY it’s how it works with aggregate functions . As you remember adding PARTITION BY to aggregate function gives us the same result for all rows , but ORDER BY recalculate result depending on the number of row. To see it we will create a table of payments history

create table payment_history(
id serial primary key,
payment double precision,
user_id int references emp(id)
);
insert into payment_history (payment,user_id)
values
(10,(select id from emp where name like 'First')),
(20,(select id from emp where name like 'First')),
(-5,(select id from emp where name like 'First')),
(-10,(select id from emp where name like 'First')),
(2,(select id from emp where name like 'First'));

We have 5 payments for First user , and now we want to see changes in balance after each payment. Without ORDER BY result will be

select e.name,p.payment, sum(p.payment) over() from emp e 
inner join payment_history p
on e.id = p.user_id;
Result 6

Now let’s add ORDER BY to query above

select e.name,p.payment, sum(p.payment) over(partition by p.user_id order by p.id) from emp e 
inner join payment_history p
on e.id = p.user_id
order by p.id;
Result 7

As you can see we have changes in user balance after each iteration. For example for first row sum is 10 because user has only 1 payment , for second row sum is 30 because it’s sum of first payment(10) and second payment(20)

Conclusion

In conclusion I want to emphasize that we didn’t consider all window functions like lead,first_value_last_value,but I hope you have understood which problems can be solved with this strong feature.In the second part of this article I want to describe window function it term of performance .

I moved to github pages strogiyotec.github.io/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store