What I Learned After Working on 1000+ SQL Coding Problems

Tefo Hulu
Bukalapak Data
Published in
8 min readJan 24, 2023

Entering Bukalapak, one of the biggest Tech companies in Indonesia was not an easy feat. Being the conqueror of the MSME Market in Indonesia with about 40% Market Share, Bukalapak used a lot of Data to create decisions.

And SQL was like the bread-and-butter for Decision Making in Bukalapak. As the bare minimum to enter this company, I spend my time learning and sharpening my skills to pass their standard.

During those times, I started to learn about 1000s SQL Problems… And here’s what I learned:

The Basics

We all know the basics of an SQL Query. It consists of something like:

SELECT column_name
FROM table_name
WHERE # enter your condition here
GROUP BY column_name
ORDER BY column_name

This is a simple statement. You SELECT something FROM something WHERE it met certain conditions. You can ask them to GROUP them by something or ORDER it by something and make it ASCENDING or DESCENDING. This is Boring. Let’s spice up some things, shall we?

Functions

A. Aggregation Function

What if I’m going to implement an aggregate function? Enter these mainstream functions:

AVG | COUNT | MAX | MIN | SUM

Based on the name, it was very obvious that these functions are, well, can be seen on the names. However, to conquer SQL Coding Tests, we have to remember these caveats:

  • Don’t forget to use GROUP BY

Before we start implementing these aggregate functions, we need to GROUP the data first before implementing the functions.

For example: If you want to find an average of each subject per student, we need to GROUP them by Students before getting the desired results.

  • The difference between HAVING and WHERE in aggregation function

I think a lot of times, these problems appear in SQL Interviews. What is the difference between HAVING and WHERE ? The answer is HAVING used after an aggregation function and WHERE used before an aggregation function. For example: If you want to find Students that have an average score of more than 5, we can use HAVING AVG(SCORE) > 5 .

An example of Aggregation Function Problems is something like this:

Find all the Student names from table Students that have the Minimum Score of less than 40 and more than 60!

The solution if this issue is pretty obvious. You can SELECT the names and the MIN(Score) AS Min_Scoretaken FROM table Students that HAVING Min_Score < 40 and Min_Score > 60 . Don’t forget to use a GROUP BY clause on names , and we’re done. They might put one or two ORDER BY to spice things up, just get ready for them.

Make it a full query, and you get something like this:

SELECT names, MIN(Score) AS min_score 
FROM Students
HAVING min_score < 40 AND min_score > 60
GROUP BY names

Before we get into more complicated problems, do you see what I did there? The first thing I do is to map the problems to an SQL Statement. This makes things easier and converts them to a working SQL Statement faster.

In the next sections, I will only explain the problems and the possible ways to solve them. Not because I’m lazy (a little bit, though, there’s Google and Stackoverflow out there), but I want to make sure that you get the logic instead of the answer itself. This is one of the things I learned at Bukalapak (a little promotion wouldn’t hurt anybody).

B. String Function

No, I’m not going to talk about REGEXP , because obviously, your interviewer wouldn’t have the capacity to remember all the syntax out there, but I will show you some mainstream functions worth remembering:

LENGTH | LOWER | UPPER | INITCAP | SUBSTRING | REPLACE

Yeah, the names are clear, isn’t it? For the word length, make it all lower case, make it all upper case, starts with a capital, find a substring, and change it. This will be very useful, particularly if you find a problem that asks you to, say, input all your answers with an UPPER ?

Another thing that will make our life easier if we got a problem with a String Function is a little (just a little, I promise) function of percentage, with the Syntax shown below:

%foo | foo% | %foo% | foo

A “percentage” means that any string can follow a certain string, where:

  • If you put it in the start, you can have anything that ends with that string (%foo)
  • If you put it in the end you can have anything that starts with that string (foo%)
  • If you put it in the start and the end, you can have anything that contains that string (but not in the start or in the end) (%foo%)
  • If you did not put any, you can only find the contents that have the exact same carbon copy of the string (foo)

And don’t forget, to use this function, we need to use LIKE instead of a = function.

Say you get some problems like this:

Find all the Cities that end with "na", and starts with "na" but did not have a "na" in the middle.

The first thing to do if you get this kind of problem is say WTF, because why should you get this kind of problem? The next thing to do is to find the solution using the percentage function.

We can see that there are three conditions (which can be put on a WHERE clause) that can be converted:

  1. It starts with “na” ➝ WHERE cities LIKE 'na%'
  2. It ends with “na” ➝ WHERE cities LIKE '%na'
  3. It did not have a “na” in the middle ➝ WHERE cities NOT LIKE '%na%'

Put these WHERE clause to your query and don’t forget to add all the WHERE clause using AND . You know the drill, guys.

C. Numeric Function

This function is used to do the basic arithmetic operations in SQL. Just like other Data Processing Applications (like Microsoft Excel), the list of the functions can be seen below:

+ | — | * | / | CAST() | MOD | ROUND| CEIL | FLOOR | TRUNC | ABS | SQRT | COALESECE

The symbols do as expected, Modulo, Round, Ceil, Floor, Absolute value, Square Root, etc. The interesting ones are these functions:

  • CAST() function changes one type into another. For example, CAST STRING to INT64 changes the type from String to an Integer
  • ROUND() function that rounds a number into some digits behind the comma. For example, ROUND(318.28,0) returns 318 with 0 digits behind the comma.
  • TRUNC() is a function that is more powerfulROUND() function. The function TRUNC() can accept a negative value on the second parameter, and this negative value rounds up the digits after the comma. For example, TRUNC(1238, -2) results of 1200.
  • COALESECE is a function to find the first non NULL value. For example, COALESCE(column_A, 123) means that if the column_A is NULL, then the value will be 123

This function will be helpful in case you find a problem where the interviewer is so complicated that they ask you to round a table up, down, absolute, or even truncate them. These functions can be your saviour.

Oh, just one more thing to spice things up. The function CAST() can be used for date functions. You can change a STRING to a DATE using this function. And by the way, you can also SELECT some parts of the date, (using MONTH() or DAY()etc.). We can also add them up using functions like TIMESTAMP_ADD() or TIMESTAMP_DIFF() .

D. CASE WHEN Function

You started to get bored reading this content and go to your Instagram only to find your feed filled with SQL Tutorials. And you’re going to find some problems like:

Select the name of the student, and their grade. If their score is more than 70, they passed, or else, they fail.

This problem can easily be solved using a CASE WHEN statement. The grade column can be solved below:

CASE WHEN score > 70 THEN “Pass” ELSE “Fail” END AS Grade

The common pitfall of these kinds of CASE WHEN statements are the conditions, where:

  • Sometimes we write this >= instead of >, or maybe = instead of != . These kinds of issues might reduce your score (or how your interviewer sees you), or
  • You forget to put a quotation mark after THEN or ELSE if your statement is a String. This creates an issue where the String is being cast into an SQL statement and later throws an error.

This problem should be easy if you consider all the corner cases. Next stop before this part ends… Joins.

JOINS

Table one, and table two, imagine them as a circle overlapping. If all table one is selected, it is LEFT JOIN, if all table two is selected, it is RIGHT JOIN, if all table one and table two is selected, it is FULL OUTER JOIN, and if only the same contents of both table one and table two are selected, it is INNER JOIN. Don’t tell me you have learned SQL if you have never seen this situation.

I sent you the picture and it is done:

JOINS

Or is it?

The types of JOIN are

INNER | LEFT | RIGHT | OUTER | CROSS | NATURAL

Joins are the most important aspect of Basic SQL. Because in the real world, it is very rare that we can get all the information only from one table.

Besides the mainstream picture above, the other types of joining as shown in the snippet above are:

  • CROSS JOIN : A join that shows the full Cartesian Product from two tables. For example, if table A has 3 rows and table B has 3 rows, a CROSS JOIN will results in 9 rows.
  • NATURAL JOIN : A join if two tables have the same column names

The interview problems of JOIN remains the same. You got two or more tables, and the expected output is some columns where the information is scattered on those tables. Don’t forget to use the ON after the JOIN if you didn’t want to see any errors (and want to join the contents properly).

But still, there is a type of join that is often overlooked, and I called it Self-Join. How does it work? It happened to these kinds of problems:

For each employee, find the name of the manager, and find the manager of their manager. If there is no one above him, put a NULL

These kinds of problems usually only have one table, which consists of the Employee Information (which consists of id and name), and their Manager (usually only id).

To solve this issue, all we need to do is to join the table to itself multiple times based on the number of the hierarchy (from the problem above, 3: employee, employee’s manager, and the manager of the manager’s employee).

That’s it. Conquer the Basics, the Functions, and the Joins, and you should be able to answer approximately 50–60% of the problems given to you. Where does the rest go?

That’s all for now, folks. See you in the next section where the problems will be getting a little bit more tangled.

--

--