Leaderboards and Rankings with SQL

Using Window Functions

Isuru Weerarathna
Analytics Vidhya
10 min readDec 12, 2019

--

Photo by Daniel von Appen on Unsplash

Believe me, SQL’s window functions are one of a powerful feature I have seen in the database world. it excels especially when you want to query analytic information. The provided syntax in this article is on MySQL, and MySQL began supporting window functions since version 8. Before MySQL 8, I had to write a lot of business logic in order to get some analytic data out of the database.

Let’s dive into the window functions by using an example, so you will able to grasp the power of this feature easily. Let’s create leaderboards with rankings for a simple game.

Scenario

Assume that you are developing an online game like QuizUp, and it allows users to register and play the game. A registered user can participate in quizzes under topics. A user will score every time plays a quiz, but only the highest score will be taken into account when ranking. The users will be ranked within each topic, and, under a particular topic, users will be further ranked under country/region-wise too. For example, if you are playing the game under topic ‘Databases’ then you will have a rank for Database topic (a global rank among all the users who have played this topic), and a rank within the country you play (assuming you are in the US, and you will have a country rank or continent rank).

In the dashboard (or app), all users who at least once participated in a quiz under a particular topic will be shown in the topic leaderboard. Something like this.

Image Courtesy: https://iphone.qualityindex.com/games/7225323/quizup/

This is a good use case of using window functions to achieve what we want to display on the dashboard.

Our Schema

Let’s keep the schema simple for this example. We have only one table called, score. And it has the recorded scores for each user every time he/she played a quiz, and for the convenience, we assume that against each user we have his/her country as well, based on the logged-in country at the time of play or account information given in advance. The below table shows a sample set of data we will be going to rank under the topic id, 101.

For simplicity's sake, we will consider only two countries (US, CAN) and eight users (user ids, 1..8).

Getting Global Ranks within Topic

Step 1: Aggregate

First, let's aggregate all the scores per user and team. So, that we will have the highest score for each user against his/her country.

We have used country_id for grouping because when we’re going to do the ranking, we need to have the country_id field for the parent query. Also, if the game rules have changed to take total score instead of the highest score, simply, only we have to change the aggregate function from MAX to SUM.

Step 2: Rankings

Now we want to get the rank of each user relative to each country the user belongs to. The expected result should look like below.

Now let’s see the query to get the above result.

Here, what we do is, we take the query in Step-1 as a derived table for the new query, by enclosing it inside the FROM clause. Check the new projection clause.

Here we “partition” records by country_id column. When the database iterate through each record, it keeps the corresponding partition the current record belongs to. In this case, its partitioning by country. At the same time, we instruct the engine to keep track of the ordering of records within the partition too. So we say, we want to order by the calculated highestScore in the descending order. Therefore the highest scorer will come on top.

Further, we need to provide an aggregate function for the partition. Since we need tanking, there are quite a few to choose from here in MySQL, as we can rank in several different ways, but two is worth mentioning.

  • DENSE_RANK: rows within equal rank will have the same rank and will have no gaps. i.e. Consider three users U1, U2 and U3 having high score 80, 80, 78 respectively. Using DENSE_RANK will have rankings 1, 1, 2 for three users.
  • RANK: rows within equal ranks will have the same rank, but there can be gaps in rank values. i.e. if two users get the same ranking, say 4th, then the next user will be ranked 6th because two users are in 4th, hence rank 5th is not available.

We’ve chosen RANK since it is (I believe) the general way of ranking.

Since we have two users having the same highest score for country US, they will be ranked 1, and the next highest scored user (i.e #8), will be ranked 3rd, due to the RANK function. If we had chosen the DENSE_RANK function, user #8 will be ranked 2.

See the difference of a result in a partitioned query. Unlike a group by clause in SQL, partitioning still returns all records.

Now we are half done.

Getting Global Ranks

As said earlier, assume we need to show user’s global rank in the dashboard along with an individual’s country rank as well. First, let’s see the query for it.

This is very similar to the above query in Step-2, except we removed the partition clause. What happens here is that, as soon as we removed the partition clause, the query will not have any partition, but only one. The only and default partition that exists for the query, is the whole table. So, when the RANK is applied, it calculates the rank relative to the whole table. Hence we get the global rank. That means the query will rank users regardless of the team. Here is the result of the above query.

Two Birds with One Stone

In the dashboard, or from your business layer, do you really think you want to call twice to get the two rankings, and then combine for each user

The answer is NO!

You don’t really want to write two queries, one to get team ranks, and one to get global ranks. In fact, you can get both ranks in a single query.

That is the beauty of window functions!

You can have as many partitions as you want on each projection clause. One partition clause does not affect to other projection clauses. They are totally independent.

So, the combined query should look like this.

Now we have the desired result with output like below having calculated both ranks in two columns.

You might be expecting to show the above result on the leaderboard page of your app or dashboard. But what if you have to show both ranks of a particular user in his/her profile page.

Get Both Rankings of a Single User

You might be tempted to think that, it is trivial to get the ranks of a single user by filtering only the record for that user id. In simpler terms, just add a where clause within the condition, user_id = ?before order by clause in the outer query, right?

Wrong!

If you do so, you will not be able to calculate rankings. For all users, it will always show team rank as 1st and global rank as 1st, which is obviously wrong.

The reason for this behavior is very simple.

In SQL world, WHERE clause gets executed before the SELECT clause.

Therefore, when the records are going to be partitioning, which is inside the projection, there is only one record, and it is the record for the filtered users. But to calculate rankings, we need highScores for every user. You can neither add the user filtering inside the derived table nor the outer query.

So, where do we put the filtering?

Easy! Just enclose this query inside another outer query, and filter out the user from enclosing query. See below.

This query will return correctly ranked single user record. For example, making rt.user_id = 8 will return the below result.

Three depths of the above query show the three stages of result transformation to get rankings. i.e.

  1. Aggregating: aggregate results first,
  2. Ranking: then apply rankings, and
  3. Filtering: filter out the necessary output.

That’s it.

Rankings within Time Periods

If you want to calculate rankings for a particular time period, for e.g: this week or last week, then the only thing we had to do is, append a where clause to the inner-most query like ts BETWEEN <startWeek> AND <endWeek> (Assuming you have a timestamp column in the schema of score table).

Unlike user filtering, in this case, we should calculate high scores only for a given time period before calculating the rankings. That’s why we need to put time range filtering inside the innermost query.

Final Notes:

  • This query looks very inefficient due to the depth of inner queries, but in reality, it is not, when you are using the correct indices. Since the score table can have records for multiple users, topics, and countries, we should have three separate indexes for user_id, topic_id and country_id column.
  • You may replace the conditiontopic_id = 101 with a parameterized one, such as, topic_id = ?, so the query becomes reusable.
  • Most of the above queries will work on any RDBMS without any change, but you need to care about DB support of window queries. Such as window functions introduced in MySQL after version 8.

--

--