SQL Interview Question for Data Analysts: Monthly Percent Change

Working through SQL interview questions step by step

Josh Berry
Learning SQL

--

Photo by Tim Mossholder on Unsplash

Introduction

As a part of my role at a software company, I design SQL generator functions to automate the writing of SQL, so that users don’t actually have to know it. The code I write is all open source and publicly available on Github, and is the driving force behind the SQL Generator 5000 — a free website for generating SQL. Edit: Rasgo has transitioned to AI for SQL generation, so the above link is broken and the resource no longer exists.

As a part of my testing, I like to take real-world examples from the internet, and solve them using Rasgo. This helps me identify areas for improvement as well as validate that everything is working as expected.

I believe that sharing these examples will help folks that are currently learning SQL. I hope these individuals will benefit from seeing the way I approach the problem. I will always provide the raw SQL so that even readers who don’t care about the approach can skip to the end to get the solution.

Monthly Percentage Difference

Today’s interview question comes from stratascratch.com, questionID 10319.

Given a table of purchases by date, calculate the month-over-month percentage change in revenue. The output should include the year-month date (YYYY-MM) and percentage change, rounded to the 2nd decimal point, and sorted from the beginning of the year to the end of the year. The percentage change column will be populated from the 2nd month forward and can be calculated as ((this month’s revenue — last month’s revenue) / last month’s revenue)*100.Table: sf_transactions

The data

The revenue column that the question refers to, appears to be called value, and the date column which we need to make monthly is called created_at.

Step-by-Step Solution

The trick I’ve learned to tackling these problems, is to simply work on them in smaller pieces. I like to use Common Table Expressions (CTE’s) to allow me to chain simple operations together.

My plan is to string together a series of steps:

  1. Aggregate to monthly data
  2. Lag the month to get previous month
  3. Apply the given formula to calculate percent_change
  4. Clean up the result (column selection, renames, ordering, etc)

Aggregate monthly

To calculate the month beginning date, I used DATE_TRUNC in the SQL Generator 5000.

WITH MY_STARTING_POINT AS (
SELECT *,
DATE_TRUNC(MONTH, created_at) as created_at_MONTH
FROM
sf_transactions
)

From there, I used aggregate to create monthly numbers.

, CTE_MONTHLY AS (
SELECT
created_at_MONTH,
SUM(value) as value_SUM
FROM
MY_STARTING_POINT
GROUP BY
created_at_MONTH
)

Lag for previous month

Next, I applied the lag function to reach one row prior and retrieve the previous month’s value.

In this case, there is no need for a partition by column, so the SQL Generator added the clause PARTITION BY NULL. You can also simply omit that clause, but I like to keep it so that my code is more readable and easier to modify or understand later.

, CTE_LAGGED AS (
SELECT *,
lag(value_sum, 1) over (
partition by NULL
order by
created_at_MONTH
) as Lag_VALUE_SUM_1
from
CTE_MONTHLY
)

Calculate Percentage Change

This next part was tricky; it required me to run a few iterations of SQL to get it right.

First, I applied the formula exactly as explained in the question: ((VALUE_SUM — LAG_VALUE_SUM_1) / LAG_VALUE_SUM_1) * 100

I quickly noticed, however, that the question wants us to round the answer to 2 decimal places. So, I wrapped the whole thing with a ROUND function:

ROUND(((VALUE_SUM — LAG_VALUE_SUM_1) / LAG_VALUE_SUM_1) * 100, 2)

Therefore, my final SQL was:

, CTE_WITH_PCT AS (
SELECT *,
ROUND(
(
(VALUE_SUM - LAG_VALUE_SUM_1) / LAG_VALUE_SUM_1
) * 100,
2
) AS PERCENT_CHANGE
FROM
CTE_LAGGED
)

Cleaning it all up

My final step is to clean up the result. One way to do this would be to go back and edit my SQL, paying careful attention to the aliases along the way.

I find it easier, however, to simply use a final CTE to clean things up. In this case, they want the date presented and sorted using YYYY-MM format.

I started with SUBSTRING, but PostgresSQL didn’t like doing a substring on a date column, so I used TO_CHAR().

SELECT 
to_char(CREATED_AT_MONTH, 'YYYY-MM') AS MYMONTH,
PERCENT_CHANGE
FROM
CTE_WITH_PCT
ORDER BY
1

Final Result

We passed on the first try! Here is the full code along with the output:

Conclusion

I often get challenged for writing SQL this way (stringing together lots of CTEs) — and yes, as a SQL veteran myself I usually consolidate my code into fewer steps. However, I have also taught SQL and have found this “piecewise” style is much more conducive to learning.

If you are learning, I suggest breaking the problem down into smaller steps. Here is the visual representation of the steps we just built together.

Visual representation of the steps in Rasgo

I hope you enjoyed this series. I plan on doing a few more real-world interview questions to see if folks find it useful. My goal was to show you that these SQL problems are solvable by breaking the problem down into smaller bite-sized chunks. If you have suggestions, questions, or comments, you can find me hanging out in Locally Optimistic and DataTalks.Club.

Go from SELECT * to interview-worthy project. Get our free 5-page guide.

--

--

Josh Berry
Learning SQL

Data scientist @ Rasgo, DataRobot, Comcast. Passionate about teaching and helping others.