LeetCode SQL: 1454. Active Users

Shivananda D
5 min readMar 4, 2023

--

Difficulty: Medium, Company Tags: Amazon, TikTok, ByteDance

Photo by Brooke Cagle on Unsplash

Problem Statement

This problem is a variation of the famous Gaps and Islands problem and the idea behind solving this problem is going to be really helpful in your SQL interviews.

Table: Accounts

+---------------+---------+
| Column Name | Type |
+---------------+---------+
| id | int |
| name | varchar |
+---------------+---------+
id is the primary key for this table.
This table contains the account id and the user name of each account.

Table: Logins

+---------------+---------+
| Column Name | Type |
+---------------+---------+
| id | int |
| login_date | date |
+---------------+---------+
There is no primary key for this table, it may contain duplicates.
This table contains the account id of the user who logged in and the login date. A user may log in multiple times in the day.

Active users are those who logged in to their accounts for five or more consecutive days.

Write an SQL query to find the id and the name of active users.

Return the result table ordered by id.

The query result format is in the following example.

Example 1:

Input: 
Accounts table:
+----+----------+
| id | name |
+----+----------+
| 1 | Winston |
| 7 | Jonathan |
+----+----------+
Logins table:
+----+------------+
| id | login_date |
+----+------------+
| 7 | 2020-05-30 |
| 1 | 2020-05-30 |
| 7 | 2020-05-31 |
| 7 | 2020-06-01 |
| 7 | 2020-06-02 |
| 7 | 2020-06-02 |
| 7 | 2020-06-03 |
| 1 | 2020-06-07 |
| 7 | 2020-06-10 |
+----+------------+
Output:
+----+----------+
| id | name |
+----+----------+
| 7 | Jonathan |
+----+----------+
Explanation:
User Winston with id = 1 logged in 2 times only in 2 different days, so, Winston is not an active user.
User Jonathan with id = 7 logged in 7 times in 6 different days, five of them were consecutive days, so, Jonathan is an active user.

Solution

The crux of the problem lies in identifying groups of consecutive days. This kind of problem is critical to understand for your advanced SQL interviews. Let’s say you are given a dates table with the following data:

Here, we have two groups of consecutive days: ‘2020–06–01’ to ‘2020–06–04’ and ‘2020–06–16’ to ‘2020–06–18’. To identify them as two separate groups, first, let’s rank the dates in ascending order using the RANK() function.

select
login_date,
rank() over (order by login_date) as rk
from dates;

Now, let’s see what happens if we subtract rk number of days from the corresponding login_date.

with cte as (
select
login_date,
rank() over (order by login_date) as rk
from dates
)
select
login_date,
rk,
date_add(login_date, INTERVAL -rk DAY) as grp_date
from cte;

Notice that two groups of consecutive days have the same grp_date value. If we want groups with more than five consecutive days, we can group by grp_date and return all those days having COUNT(*)≥5. This is the idea that we will be using to solve the main problem at hand.

The input Logins table looks like this:

Let’s create a RankCte table: order the dates by id, and assign a rank to each date using the RANK() function. Also, we can have a user logging in multiple times on the same day, for example, id=7 has logged in twice on 2020–06–02. So, we need to consider only DISTINCT records from the given Logins table.

with RankCte as (
select
id, login_date,
rank() over (partition by id order by login_date) as rk
from (select distinct * from Logins) l
)
select * from RankCte;

In the next step, let’s create GroupingCte table where we subtract rk days from login_date.

with RankCte as (
select
id, login_date,
rank() over (partition by id order by login_date) as rk
from (select distinct * from Logins) l
),
GroupingCte as (
select *,
date_add(login_date, INTERVAL -rk DAY) as grp_date
from RankCte
)
select * from GroupingCte;

Now, we can have one more cte to return all ids from this table where COUNT(*) ≥ 5 when grouped by id and grp_date. This will give us the ids of users who logged in to their accounts for five or more consecutive days. In this case, the query should return id=7.

with RankCte as (
select
id, login_date,
rank() over (partition by id order by login_date) as rk
from (select distinct * from Logins) l
),
GroupingCte as (
select *,
date_add(login_date, INTERVAL -rk DAY) as grp_date
from RankCte
),
FinalIds as (
select id
from GroupingCte
group by id, grp_date
having count(*)>=5
)
select * from FinalIds;

Then, in the last step, join with Accounts table to get names for ids, and order by id to get the final result.

with RankCte as (
select
id, login_date,
rank() over (partition by id order by login_date) as rk
from (select distinct * from Logins) l
),
GroupingCte as (
select *,
date_add(login_date, INTERVAL -rk DAY) as grp_date
from RankCte
),
FinalIds as (
select id
from GroupingCte
group by id, grp_date
having count(*)>=5
)
select f.id, a.name
from FinalIds f
join Accounts a
on f.id = a.id
order by f.id;

And that’s it! I hope you found the post helpful!

I will be writing more articles on interesting SQL problems that are part of LeetCode SQL Study Plan, and this problem is one among them. You can find solutions to all LeetCode SQL Study Plan problems in my GitHub repository.

--

--