How to select top n rows from each category in PostgreSQL?

Amulya Kumar Sahoo
3 min readDec 20, 2018

--

While working on a project, I got a requirement to query a specific type of data from a PostgreSQL Table. It was to fetch the top 3 rows based on a specific value in a row in each category which in our case was to find the names of candidates with top 3 scores for each job.

It was a special type of query because we can’t use limit in this case. The first thing that comes to mind with this type of query is to use UNION operator which is a naive solution and the least efficient one because we have to write the query specifying each job_id. The query will look like this:

select * from score_data where job_id=1 order by score desc limit 3
UNION
select
* from score_data where job_id=2 order by score desc limit 3
UNION
select
* from score_data where job_id=3 order by score desc limit 3
.
.
.
select * from score_data where job_id=n order by score desc limit 3

Upon some research, I arrived at Window Functions which, as per the definition:

Window functions provide the ability to perform calculations across sets of rows that are related to the current query row.

One of the function is rank() using which, we can add a new column which contains the rank of the current row in each group.

We generate a rank using a query as follows:

SELECT score_data.*, 
rank() OVER (PARTITION BY job_id ORDER BY score DESC)
FROM score_data

The table, score_data has columns- id, job_id, candidate_id, score.
Let’s see what happens in the above query. The OVER specifies how to partition or window the rows inside which, we have specified the partition rule. PARTITION BY partitions the table based on job_id and ORDER BY orders the rows based on the score inside the partition. This adds a new column with called rank to the result set. rank keeps the rank of the row in it’s group.

The output of the above query looks like:

Using RANK() window function, we rank the candidates on scores

Now, we have to fetch the top three candidates for each job_id for which, we will run an extra select query over this selection where rank will be ≤3.

select ranked_scores.* from 
(SELECT score_data.*,
rank() OVER (PARTITION BY job_id ORDER BY score DESC)
FROM score_data) ranked_scores
where rank <=3

Which yields our required result.

Our required result set i.e. Top 3 scores for each job

If you have observed anything strange in any of the above results, stay tuned, I will describe in my next post.

Thank you !

--

--