What I Learned After Working on 1000+ SQL Coding Problems
Entering Bukalapak, one of the biggest Tech companies in Indonesia was not an easy feat. Being the conqueror of the MSME Market in Indonesia with about 40% Market Share, Bukalapak used a lot of Data to create decisions.
And SQL was like the bread-and-butter for Decision Making in Bukalapak. As the bare minimum to enter this company, I spend my time learning and sharpening my skills to pass their standard.
During those times, I started to learn about 1000s SQL Problems… And here’s what I learned:
The Basics
We all know the basics of an SQL Query. It consists of something like:
SELECT column_name
FROM table_name
WHERE # enter your condition here
GROUP BY column_name
ORDER BY column_name
This is a simple statement. You SELECT
something FROM
something WHERE
it met certain conditions. You can ask them to GROUP
them by something or ORDER
it by something and make it ASCENDING
or DESCENDING
. This is Boring. Let’s spice up some things, shall we?
Functions
A. Aggregation Function
What if I’m going to implement an aggregate function? Enter these mainstream functions:
AVG | COUNT | MAX | MIN | SUM
Based on the name, it was very obvious that these functions are, well, can be seen on the names. However, to conquer SQL Coding Tests, we have to remember these caveats:
- Don’t forget to use
GROUP BY
Before we start implementing these aggregate functions, we need to GROUP
the data first before implementing the functions.
For example: If you want to find an average of each subject per student, we need to GROUP
them by Students before getting the desired results.
- The difference between
HAVING
andWHERE
in aggregation function
I think a lot of times, these problems appear in SQL Interviews. What is the difference between HAVING
and WHERE
? The answer is HAVING
used after an aggregation function and WHERE
used before an aggregation function. For example: If you want to find Students that have an average score of more than 5, we can use HAVING AVG(SCORE) > 5
.
An example of Aggregation Function Problems is something like this:
Find all the Student names from table Students that have the Minimum Score of less than 40 and more than 60!
The solution if this issue is pretty obvious. You can SELECT
the names
and the MIN(Score) AS Min_Score
taken FROM
table Students
that HAVING Min_Score < 40 and Min_Score > 60
. Don’t forget to use a GROUP BY
clause on names
, and we’re done. They might put one or two ORDER BY
to spice things up, just get ready for them.
Make it a full query, and you get something like this:
SELECT names, MIN(Score) AS min_score
FROM Students
HAVING min_score < 40 AND min_score > 60
GROUP BY names
Before we get into more complicated problems, do you see what I did there? The first thing I do is to map the problems to an SQL Statement. This makes things easier and converts them to a working SQL Statement faster.
In the next sections, I will only explain the problems and the possible ways to solve them. Not because I’m lazy (a little bit, though, there’s Google and Stackoverflow out there), but I want to make sure that you get the logic instead of the answer itself. This is one of the things I learned at Bukalapak (a little promotion wouldn’t hurt anybody).
B. String Function
No, I’m not going to talk about REGEXP
, because obviously, your interviewer wouldn’t have the capacity to remember all the syntax out there, but I will show you some mainstream functions worth remembering:
LENGTH | LOWER | UPPER | INITCAP | SUBSTRING | REPLACE
Yeah, the names are clear, isn’t it? For the word length, make it all lower case, make it all upper case, starts with a capital, find a substring, and change it. This will be very useful, particularly if you find a problem that asks you to, say, input all your answers with an UPPER
?
Another thing that will make our life easier if we got a problem with a String Function is a little (just a little, I promise) function of percentage, with the Syntax shown below:
%foo | foo% | %foo% | foo
A “percentage” means that any string can follow a certain string, where:
- If you put it in the start, you can have anything that ends with that string (
%foo
) - If you put it in the end you can have anything that starts with that string (
foo%
) - If you put it in the start and the end, you can have anything that contains that string (but not in the start or in the end) (
%foo%
) - If you did not put any, you can only find the contents that have the exact same carbon copy of the string (
foo
)
And don’t forget, to use this function, we need to use LIKE
instead of a =
function.
Say you get some problems like this:
Find all the Cities that end with "na", and starts with "na" but did not have a "na" in the middle.
The first thing to do if you get this kind of problem is say WTF, because why should you get this kind of problem? The next thing to do is to find the solution using the percentage function.
We can see that there are three conditions (which can be put on a WHERE
clause) that can be converted:
- It starts with “na” ➝
WHERE cities LIKE 'na%'
- It ends with “na” ➝
WHERE cities LIKE '%na'
- It did not have a “na” in the middle ➝
WHERE cities NOT LIKE '%na%'
Put these WHERE
clause to your query and don’t forget to add all the WHERE
clause using AND
. You know the drill, guys.
C. Numeric Function
This function is used to do the basic arithmetic operations in SQL. Just like other Data Processing Applications (like Microsoft Excel), the list of the functions can be seen below:
+ | — | * | / | CAST() | MOD | ROUND| CEIL | FLOOR | TRUNC | ABS | SQRT | COALESECE
The symbols do as expected, Modulo, Round, Ceil, Floor, Absolute value, Square Root, etc. The interesting ones are these functions:
CAST()
function changes one type into another. For example,CAST
STRING
toINT64
changes the type from String to an IntegerROUND()
function that rounds a number into some digits behind the comma. For example,ROUND(318.28,0)
returns 318 with 0 digits behind the comma.TRUNC()
is a function that is more powerfulROUND()
function. The functionTRUNC()
can accept a negative value on the second parameter, and this negative value rounds up the digits after the comma. For example,TRUNC(1238, -2)
results of 1200.COALESECE
is a function to find the first nonNULL
value. For example,COALESCE(column_A, 123)
means that if thecolumn_A
isNULL
, then the value will be 123
This function will be helpful in case you find a problem where the interviewer is so complicated that they ask you to round a table up, down, absolute, or even truncate them. These functions can be your saviour.
Oh, just one more thing to spice things up. The function CAST()
can be used for date functions. You can change a STRING
to a DATE
using this function. And by the way, you can also SELECT
some parts of the date, (using MONTH()
or DAY()
etc.). We can also add them up using functions like TIMESTAMP_ADD()
or TIMESTAMP_DIFF()
.
D. CASE WHEN Function
You started to get bored reading this content and go to your Instagram only to find your feed filled with SQL Tutorials. And you’re going to find some problems like:
Select the name of the student, and their grade. If their score is more than 70, they passed, or else, they fail.
This problem can easily be solved using a CASE WHEN
statement. The grade column can be solved below:
CASE WHEN score > 70 THEN “Pass” ELSE “Fail” END AS Grade
The common pitfall of these kinds of CASE WHEN
statements are the conditions, where:
- Sometimes we write this
>=
instead of>
, or maybe=
instead of!=
. These kinds of issues might reduce your score (or how your interviewer sees you), or - You forget to put a quotation mark after
THEN
orELSE
if your statement is a String. This creates an issue where the String is being cast into an SQL statement and later throws an error.
This problem should be easy if you consider all the corner cases. Next stop before this part ends… Joins.
JOINS
Table one, and table two, imagine them as a circle overlapping. If all table one is selected, it is LEFT JOIN
, if all table two is selected, it is RIGHT JOIN
, if all table one and table two is selected, it is FULL OUTER JOIN
, and if only the same contents of both table one and table two are selected, it is INNER JOIN
. Don’t tell me you have learned SQL if you have never seen this situation.
I sent you the picture and it is done:
Or is it?
The types of JOIN
are
INNER | LEFT | RIGHT | OUTER | CROSS | NATURAL
Joins are the most important aspect of Basic SQL. Because in the real world, it is very rare that we can get all the information only from one table.
Besides the mainstream picture above, the other types of joining as shown in the snippet above are:
CROSS JOIN
: A join that shows the full Cartesian Product from two tables. For example, if table A has 3 rows and table B has 3 rows, aCROSS JOIN
will results in 9 rows.NATURAL JOIN
: A join if two tables have the same column names
The interview problems of JOIN
remains the same. You got two or more tables, and the expected output is some columns where the information is scattered on those tables. Don’t forget to use the ON
after the JOIN
if you didn’t want to see any errors (and want to join the contents properly).
But still, there is a type of join that is often overlooked, and I called it Self-Join. How does it work? It happened to these kinds of problems:
For each employee, find the name of the manager, and find the manager of their manager. If there is no one above him, put a NULL
These kinds of problems usually only have one table, which consists of the Employee Information (which consists of id and name), and their Manager (usually only id).
To solve this issue, all we need to do is to join the table to itself multiple times based on the number of the hierarchy (from the problem above, 3: employee, employee’s manager, and the manager of the manager’s employee).
That’s it. Conquer the Basics, the Functions, and the Joins, and you should be able to answer approximately 50–60% of the problems given to you. Where does the rest go?
That’s all for now, folks. See you in the next section where the problems will be getting a little bit more tangled.