SQL: Learning SQL Implementation for Data Analysts With Common Daily Cases

A Basic Use Case in E-commerce

Muhamad Nur Chozin
8 min readAug 1, 2022
Source: DQLab

I am trying to challenge myself to learn a new topic by joining a Data Analyst Bootcamp lately. I thought that I would get fewer difficulties since I had a basic SQL understanding by taking a web developer class during my vocational high school. But, joining a Bootcamp appears to be a weary yet challenging journey to refresh my knowledge that somehow had been evaporated for some reason. As someone who’s currently dedicating to gaining a deeper understanding of SQL, I’d like to share my story during my journey. In this post, I don’t want to dive into a basic introduction to SQL. You might need to check this post instead. He has a great explanation about SQL.

Filtering Data Before Make Them to Tell a Story

We would agree that the main data analyst task is to analyze the data by giving a story through data visualization from the given data. The data itself is a sort of a bunch of information yet has meaning until we can successfully tell the unspoken story inside it. But please bear with me. Before we can visualize the data, we need to select the proper data we will try to visualize. SQL plays a significant role in it.

Selecting the Data

Selecting the data in SQL is quite tricky. It seems to be hard to perform on the first try but will be easier once we have understood the logic inside. There is a donkey bridge that can help you easily memorize the SQL code structure for data selection. Here is our main weapon during data selection. The basic structure is mandatory to be followed. The miss placing structure won’t make our code work.

SQL Structure Donkey Bridge

From the above image, we can understand that there is some basic structure in SQL with its own role. Basically, we are trying to give some filters by setting up a particular condition and then grouping the data. The result of SQL is in a form of a table structure that fulfilled the given condition. In short, SQL tries to make meaningful data from databases that can tell a story.

Use Case in SQL

In this section, I will make a scene when we are a Data Analysts in an e-commerce company. We are working with some database tables, namely: e-commerce events and user profile. On daily bases, we are asked to give some information related to sales data. Here is the table structure.

Ecommerce Event Data Structure
User Profile Data Structure

Case #1: Selecting Total User and Price From a Given Criteria

From the ecommerce_event table, we are asked to write a SQL query to make an output about: a) Total Unique user_session; and b) Min, Max, and Average Price. We’re also given criteria that the event_type is only for “view” and and all brand except for “apple” and “Samsung”. 

To perform this, we need to build the logic first. For me, it will be easier if we make this kind of helper table. You can make a helper table by referencing the donkey bridge above. Remember that in an actual case, we will possibly face a complex condition, and this kind of table will help us to understand the logic.

Helper Table for Case 1

After we’re done with the helper table, we can start making a code based on the helper table. Here is the code that we can run.

Source Code for Case 1

Code Explanation:

You might see that there is some extra code that is not mentioned previously. There are Count(Distinct), min(), max(), avg(), As, and Not in the statement. Here is the meaning.

The first four above mentioned code is an agregate function. The function is used to sumarize the data. Count basically is a function to count the number of records being shown based on the given criteria(s). You can see in our helper table that there is a “unique” criteria (highlighted code). In this case, we only want to count the unique record which is mean that we will drop the duplicate records and count it as a single records. Therefore, we need a distinct function. By making a distinction, we will get a unique record at the end. Meanwhile, the rest agregate function is quite familiar for us especially when we are being used to use excel.

Then, the " As " code gave a name on the header output table. We will change the header table from the database structure by adding this code. And for the “Not in” code, basically, it is a code for the “having” structure. Since the condition is set to exclude the particular data, we used “not in” instead of “having”.

Here is the output table.

Table Output for Case 1

We can see that the filtered data shows the summarize of the data with the given criteria. We can also see that the header table is not as same as the coloumn header in database structure. This is because we rename the header with “as” function. It is quite easy, isn’t it?

Case #2: Filtering and Grouping the Data

In this case, from the ecommerce_event table, we are asked to write a SQL query to output the total unique product and total unique user for every order date. Only show the date above 04 august 2019, and sort the result by the latest date.

As I mentioned, the first step we need to perform is making a helper table to ease our logic understanding. Here is the helper table

Helper Table for Case 2

In this case, the coding structure is mostly the same as the previous one. We just added two more structures: “Group by” and “Order by”. The group by code was used to group the output data into several groups according to the given request. In this case, since we need to see the data for each order date, so we grouped the data based on the order date. This results in calculated data for each order date. Meanwhile, the “Order by” code, as it clearly states that it is used to order the data by the given condition. In this case, we want to order the data based on the latest date, it means that we need to order the data in Descending format. Here is the final code.

Source Code for Case 2

From the code, you might be questioning why we use a number in “group by” code. Basically, group by is trying to group the data by the particular column in the databases. We can use the mentioned column order in SQL to replace the column number. You can see that the column “event_date” or the one that we renamed to be “Order Date” is the first column mentioned in the code, so we use 1 as its column reference. It will be a different case when we want to group the data by the “Total Product Id” column. Once we run the code, we will get this output.

Table Output for Case 2

Case #3: Joining Two or More Tables in Databases

Disclaimer we are beginning to enter a more complex SQL Structure. Kidding. As aforementioned, the degree of SQL complexity in daily use cases might vary. In this case, we are trying a simple SQL structure to combine two tables of data (ecommerce_event and user_profile). Here is the context.

From the event and user table, which gender has more sessions in october 2019?

First thing first, we need to make a helper table as a logic. Since we are going to combine two tables, there is an extra SQL structure. Here is the helper table.

Helper Table for Case 3

You can see an extra SQL structure, especially in “from” structure. Since we want to combine two tables, we use the “ Inner Join” code, and then we need to find the related column between the two tables. This is a linking column with the same record in both tables. In this case, we use the “user_id” column since both tables have identical user_id records.

SQL Join

There are four types of Join in SQL that you can adjust with the given case. You might need to read here for additional detail. And here is the final code.

From the code above, you might question again about the “e” and “u” letters inside the Join code. Here is the thing, since the user_id column header name is identical in both tables, so we need to add a reference name to define which column we mentioned. in this case, we use the letter “e” as the ecommerce_event table and the letter “u” as the “user_profile” table. You can use any letter as you wish. By running this code, you will get this output.

Table Output for Case 3

Case #4: Showing Daily Revenue with a Given Criteria

Combining the two tables is not as meaningful as we can get the information on it. For this case, we’re trying to connect the two tables with a slightly complex condition. Here is the case

Calculate the daily revenue and unique users for each date for male users

Again, we need a helper table! Here is the table.

Helper Table for Case 4

In the given case, we’re asked to show the revenue for each date. We know that revenue means the amount of money e-commerce gets from the purchased transaction. It means that we need to add criteria for event_type, which is only for the “purchase” record. Here is the final code.

And here is the final output.

Output Table for Case 4

Conclusion

Finally, we are reaching the end of this presumably long exhausting post. SQL is a basic knowledge we need to understand as Data Analysts. The complexity of SQL code might vary for daily cases. We need to train our ability to build logic to solve a problem. You can use the helper table to handle complex structures. This post might be too simple for the actual case in e-commerce. I will try to make another post to dive deeper into another case example. I am pretty open to any discussion. Give me claps or comments for further discussion. I hope that you can find the excitement in building a code.

*Writer is not mastering English as his first language therefore some unexpected mistakes may exist. Feel free to give some comments and advice for better writing.

--

--

Muhamad Nur Chozin

Education Enthusiast, Currently studying Public Policy Management UGM