Data Engineer Challenge with SQL

provide by DQLab

Nilda Nurmala
The Startup
8 min readDec 16, 2020

--

Finally, I have completed a course on DQLab about fundamental SQL in data analyst career track. I have completed the modules of fundamental SQL use SELECT statement, FUNCTION, GROUP BY, INNER JOIN, UNION and Having. Then, I worked on the first project provided by DQLab to apply the knowledge that I had learned. Here, the results of my first project.

Test SQL — MySQL Dialect

  • Left Join

Consider the following two tables:

If the LEFT JOIN operation is used on the kurs table and the nilai_kurs table with the condition kurs.id = nilai_kurs.kurs_id.
How many rows will be show?
Answer: 5 row

  • Right Join

Consider the following two tables:

If the RIGHT JOIN operation is used on the kurs table and the nilai_kurs table with the condition kurs.id = nilai_kurs.kurs_id.
How many rows will be show?
Answer: 5 row

  • Similarity of 2 SQL Statements

Referring to the two tables, which SQL command will have the same result as the following SQL command:

a. SELECT * FROM table1 WHERE nama = ‘Cyntia’ AND nama = ‘Rheny’
b. SELECT * FROM table1 WHERE nama IN (‘Cyntia’, ‘Rheny’)
c. SELECT * FROM table1 WHERE nama LIKE (‘Cyntia’, ‘Rheny’)
d. SELECT * FROM table1 WHERE nama LIKE ‘Cyntia’ OR nama LIKE ‘Rheny’
Answer: b and d

  • Union

From the following SQL command, how many rows will be show?

Answer: 6 row

  • Inner Join

From the following SQL command, how many rows will be show?

Answer: 4 row

DQLab Mart

The tables name used in this project are ms_konsumen, ms_produk, tr_penjualan and tr_penjualan_detail.
The dataset that will be used in this project contains the following data:

Detail of ms_pelanggan table:

Detail of ms_produk table:

Detail of tr_penjualan table:

Detail of tr_penjualan_detail table

DQLab Mart products
Referring to the ms_produk table, show a list of products that have prices between 50,000 and 150,000.
Required column name: no_urut, kode_produk, nama_produk, and harga.

The query will be performed using SQL:

Output result
  • BETWEEN… AND… is used to filter products that have a price between 50,000 and 150,000

Based on the output, there are 6 products that have prices between 50,000 and 150,000.

Thumb drive at DQLab Mart
Show all products that contain the word Flashdisk.
Required column name: no_urut, kode_produk, nama_produk, and harga.

The query will be performed using SQL:

Output result
  • LIKE is used to find products that contain the word flashdisk

Based on the output, there are 2 products whose names contain the word flashdisk.

Title customers
Show customer names who only have the titles: S.H, Ir. and Drs.
Required column name: no_urut, kode_pelanggan, nama_pelanggan, and alamat.

The query will be performed using SQL:

Output result
  • LIKE and OR are used to find nama_pelanggan that contain the words S.H., or Ir., or Drs.

Based on the output, there are 3 names of customers who have the title of S.H., or Ir., or Drs.

Sort Customer Names
Show customer names and sort the results by nama_pelanggan from smallest to largest (A to Z).
Required column name: nama_pelanggan.

The query will be performed using SQL:

Output result
  • ORDER BY … ASC is used to sort nama_pelanggan from smallest to largest

Based on the output, customer names have been sorted from letter A to letter U.

Sort Customer Names Without Titles
Show customer names and sort the results according to the nama_pelanggan from smallest to largest (A to Z), however titles should not be part of the order. Example: Ir. Agus Nugraha should be above Heidi Goh.
Required column name: nama_pelanggan.

The query will be performed using SQL:

Output result
  • ORDER BY is used to sort data
  • SUBSTRING_INDEX is used to return a substring of a string separated by a dot “.” or ignore the word before the dot “.”

Based on the output, customer names have been sorted from letter A to letter U regardless of the title.

The Longest Customer Name
Show customer name who has the longest name. If there is more than one person with the same name length, show all of them.
Required column name: nama_pelanggan.

The query will be performed using SQL:

Output result
  • LENGTH is used to find the number of characters
  • MAX is used to find the highest value
  • WHERE… IN () is used as a subquery to filter LENGTH column

Based on the output, there is one name which is the longest customer name that has 19 characters.

Longest and Shortest Customer’s Name with Title
Show customer names who has the longest name (on the top row) and the shortest name (on the next row). Title becomes part of the name. If there is more than one shortest or longest name, all of them must be showed.
Required column name: nama_pelanggan.

The query will be performed using SQL:

Output result
  • LENGTH is used to find the number of characters
  • MAX is used to find the highest value
  • MIN is used to find the lowest value
  • WHERE… IN () is used as a subquery to filter LENGTH column
  • OR is used to filter data that has the highest value or lowest value
  • ORDER BY … DESC is used to sort nama_pelanggan from largest to smallest

Based on the output, there is one name which is the longest customer name that has 19 characters and one name which is the shortest customer name that has 9 characters.

The quantity of products sold
Show the most sold products in terms of quantity. If there is more than one product with the same value, show all the products.
Required column name: kode_produk, nama_produk,total_qty.

The query will be performed using SQL:

Output result
  • SUM is used to get the total product sold
  • INNER JOIN is used to combine 2 tables between ms_produk table and tr_penjualan_detail table with kode_produk as a matching value in both tables
  • GROUP BY is used to group data based on kode_produk and nama_produk
  • HAVING is used to filter the aggregation function where the total number of products sold is 7

Based on the output, there are 2 products that are sold the most.

Customers with the Highest Purchase Value
Who are the customers who spend the most money on shopping? If there are more than one customer with the same value, show all the customers.
Required column name: kode_pelanggan, nama_pelanggan, total_harga

The query will be performed using SQL:

Output result
  • SUM is used to get the total_harga
  • INNER JOIN is used to combine 2 tables between ms_pelanggan table with tr_penjualan table and tr_penjualan table with tr_penjualan_detail table
  • GROUP BY is used to group data based on kode_pelanggan and nama_pelanggan
  • ORDER BY … DESC is used to sort total_harga from largest to smallest
  • LIMIT is used to display a single row of data

Based on the output results, there is one customer who makes the most transactions.

Customers Who Have Never Shopped
Show a list of customers who have never made a transaction.
Required column name: kode_pelanggan, nama_pelanggan, alamat.

The query will be performed using SQL:

Output result
  • WHERE … NOT IN ( ) is used as a subquery to filter kode_pelanggan columns that are not included in the list of customers who made transactions.

Based on the output results, there are 5 customers who have never made a transaction.

Shopping Transactions with a Shopping List more than 1
Show transactions that have more than one type of product. In other words, show transactions that have more than one row of data in tr_penjualan_detail table.
Required column name: kode_transaksi, kode_pelanggan, nama_pelanggan, tanggal_transaksi, jumlah_detail

The query will be performed using SQL:

Output result
  • COUNT is used to count the number of products sold
  • INNER JOIN is used to combine 2 tables between tr_penjualan table with ms_pelanggan table and tr_penjualan table with tr_penjualan_detail table
  • GROUP BY is used to group data based on kode_transaksi, kode_pelanggan, nama_konsumen and tanggal_transaksi
  • HAVING is used to filter the aggregation function where transactions that have the number of product sold for more than one type of product

Based on the output, there are 5 transactions that have the number of product sold for more than one type of product.

--

--