Exercise 1st Session Data Analisis: SQL

Ahmad Mustopa
9 min readMay 22, 2023

--

What Is SQL?

Structured Query Language (SQL) refers to a standard programming language utilized to extract, organize, manage, and manipulate data stored in relational databases. SQL is thereby referred to as a database language that can execute activities on databases that consist of tables made up of rows and columns.

SQL plays a crucial role in retrieving relevant data from databases, which can later be used by various platforms such as Python or R for analysis purposes. SQL can manage several data transactions simultaneously where large volumes of data are written concurrently.

How does SQL work?

As an SQL query is written and run, it is processed by the ‘query language processor’ having a parser and query optimizer. The SQL server then compiles the processed query in three stages:

1. Parsing: This refers to a process that cross-checks the syntax of the query.

2. Binding: This step involves verifying query semantics before executing it.

3. Optimization: The final step generates the query execution plan. The objective here is to identify an efficient query execution plan that runs in minimal time. This implies that the shorter the response time for the SQL query, the better the results. Several combinations of plans are generated to have a practical end execution plan.

Elements of SQL

SQL is the go-to-choice of most database users due to its easy usability and how the queries can carry out varied functions on vast amounts of structured data.

SQL programming language has the following vital elements:

1. Keywords

Keywords: Keywords refer to a set of words that allow you to perform operations on your database. Consider the example of the keyword ‘LIKE‘; it searches for a specific data pattern in the database.

For example, let’s say we need to identify the names of families living in the Boston area who have ‘Luis’ as their last name. The following SQL query will fetch the relevant results for this problem statement:

SELECT * FROM [BOSTON] WHERE NAME LIKE ‘%LUIS’

Similarly, different keywords perform various operations on the databases. The following are some examples of such keywords with their functional roles:

  • CREATE: This keyword helps in creating a database structure or simply tables, views, and an index
  • INSERT: It adds data to the rows of a table
  • SELECT: Selects data from database or table
  • FROM: Indicates the table from which data needs to be fetched
  • WHERE: It filters the data so that only relevant data matching certain conditions is fetched
  • UPDATE: Updates existing rows in a table
  • DELETE: It deletes the existing rows in a table

2. Clauses

Clauses: Clauses refer to the in-built functions that filter out data and retrieve the required data from the database or table. It is suitable when handling large databases. Clauses are a part of the SQL statement.

Let’s consider a use case where you need to select age, email, and address from the database. One would then represent the clause as ‘SELECT Age, Email, and Address,’ where SELECT is a keyword and age, email, and address reveal certain information to run the SQL query and retrieve the required data.

3. Expressions

Expressions: SQL expressions represent a formula typically written in a query format. It combines one or more values, operators, and SQL functions that evaluate a specific value. Moreover, SQL expressions are broadly divided into three types, namely, Boolean, numeric, and date.

Let’s consider an example of the Boolean expression that fetches data by matching single values. If you want to identify employees whose salaries are equal to 5,000, one can use the following SQL query:

SELECT * FROM EMPLOYEES WHERE SALARY = 5000;

4. Predicates

Predicates: Predicates refer to keywords that reveal a relationship between two expressions and result in a true or false value. It is just another term for an expression that is used to determine an unknown or TRUE/FALSE condition.

For example, consider the following SQL statement:

SELECT * FROM CUSTOMERS WHERE Product = ‘Television’;

Here, ‘Product = Television’ is the predicate of the SQL statement.

5. Queries

Queries: SQL queries refer to statements used to request or retrieve data from a database. For example, let’s say you want to retrieve the first name and customer number of all customers whose last name is ‘Lobo’. The following query will fetch the relevant data from the database:

SELECT First_Name, Customer_No FROM Customers WHERE Last_Name=’Lobo’;

In this portfolio, I will use PostgreSQL to collect and analyze data from a variety of sources to gain insights into these companies and understand what sets them apart from the rest.

Lets Practice . . .

Before we begin, let’s create the tables first. Here, we will apply several SQL statements to create the tables such as

THE DATA

Berikut dataset yang akan digunakan:

  1. order_detail
  2. sku_detail
  3. customer_detail
  4. payment_detail

Data yang digunakan adalah data yang berasal dari Kaggle: Pakistan’s Largest E-Commerce Dataset dengan beberapa perubahan untuk dapat memudahkan dalam praktik dengan menggunakan sql. Harga yang tertera sudah dikonversi 1 Rupee sama dengan Rp 58. Mengenai penjelasan dataset adalah sebagai berikut:

order_detail:

  1. id → angka unik dari order / id_order
  2. customer_id → angka unik dari pelanggan
  3. order_date → tanggal saat dilakukan transaksi
  4. sku_id → angka unik dari produk (sku adalah stock keeping unit)
  5. price → harga yang tertera pada tagging harga
  6. qty_ordered → jumlah barang yang dibeli oleh pelanggan
  7. before_discount → nilai harga total dari produk (price * qty_ordered)
  8. discount_amount → nilai diskon product total
  9. after_discount → nilai harga total produk ketika sudah dikurangi dengan diskon
  10. is_gross → menunjukkan pelanggan belum membayar pesanan
  11. is_valid → menunjukkan pelanggan sudah melakukan pembayaran
  12. is_net → menunjukkan transaksi sudah selesai
  13. payment_id → angka unik dari metode pembayaran

sku_detail:

  1. id → angka unik dari produk (dapat digunakan untuk key saat join)
  2. sku_name → nama dari produk
  3. base_price → harga barang yang tertera pada tagging harga / price
  4. cogs → cost of goods sold / total biaya untuk menjual 1 produk
  5. category → kategori produk

customer_detail:

  1. id → angka unik dari pelanggan
  2. registered_date → tanggal pelanggan mulai mendaftarkan diri sebagai anggota

Payment_detail:

  1. id → angka unik dari metode pembayaran
  2. payment_method → metode pembayaran yang digunakan

LETS ANSWER THE QUESTION

Nomor 1

Q: Selama transaksi yang terjadi selama 2021, pada bulan apa total nilai transaksi (after_discount) paling besar? Gunakan is_valid = 1 untuk memfilter data transaksi.

Source table: order_detail

Query

From the tabel above show that the month with the largest total transaction value in 2021 is November with the largest transaction value of 14.350.139.287 transactions and the lowest transaction value in 2021 is April with the lowest transaction value of 746.916.955 transactions.

Nomor 2

Q: Selama transaksi yang terjadi selama 2021, pada bulan apa total jumlah pelanggan (unique), total order (unique) dan total jumlah kuantitas produk paling banyak? Gunakan is_valid = 1 untuk memfilter data transaksi.

Source table: order_detail

Query

During transactions that occurred during 2021, the table above shows that November was the month with the most product transactions with a total number of customers of 13.885, total orders of 22.414 and a total quantity of 142.155.

Nomor 3

Q: Selama transaksi yang terjadi selama 2022, kategori apa yang menghasilkan nilai transaksi paling besar? Gunakan is_valid = 1 untuk memfilter data transaksi.

Source table: order_detail, sku_detail

Query

During transactions that occurred during 2022, the category that generated the most transaction value was in the Mobiles & Tablets category with a transaction value of 77.006.913.809

Nomor 4

Q: Bandingkan nilai transaksi dari masing-masing kategori pada tahun 2021 dengan 2022. Sebutkan kategori apa saja yang mengalami peningkatan dan kategori apa yang mengalami penurunan nilai transaksi dari tahun 2021 ke 2022. Gunakan is_valid = 1 untuk memfilter data transaksi.

Source table: order_detail, sku_detail

Query

In the table above we can see a comparison of transaction values ​​through the growth column. Categories that have experienced an increase in transaction value from 2021 to 2022 are Appliances, Beauty & Grooming, Computing, Entertainment, Health & Sports, Home & Living, Kids & Baby, Mobiles & Tablets, School & Education, Superstore and Women Fashion. Categories that experience a decrease in transaction value from 2021 to 2022 are Books, Men Fashion, Soghaat and Others.

Nomor 5

Q: Tampilkan Top 10 sku_name (beserta kategorinya) berdasarkan nilai transaksi yang terjadi selama tahun 2022. Tampilkan juga total jumlah pelanggan (unique), total order (unique) dan total jumlah kuantitas. Gunakan is_valid = 1 untuk memfilter data transaksi.

Source table: order_detail, sku_detail

Query

The table above shows the Top 10 sku_name (along with their categories) based on the value of transactions that occurred during 2022.

Nomor 6

Q: Tampilkan top 5 metode pembayaran yang paling populer digunakan selama 2022 (berdasarkan total unique order). Gunakan is_valid = 1 untuk memfilter data transaksi.

Source table: order_detail, payment_method

Query

The table above shows the top 5 payment methods, where COD is the most popular payment method used in 2022.

Nomor 7

Q: Urutkan dari ke-5 produk ini berdasarkan nilai transaksinya.

  1. Samsung
  2. Apple
  3. Sony
  4. Huawei
  5. Lenovo

Gunakan is_valid = 1 untuk memfilter data transaksi.

Source table: order_detail, sku_detail

Query

The table above shows that Samsung is the product with the highest transaction value, while Sony is the product brand with the lowest transaction value.

Nomor 8

Q: Seperti pertanyaan no. 3, buatlah perbandingan dari nilai profit tahun 2021 dan 2022 pada tiap kategori. Kemudian buatlah selisih % perbedaan profit antara 2021 dengan 2022 (profit = after_discount — (cogs*qty_ordered))

Gunakan is_valid = 1 untuk memfilter data transaksi.

Source table: order_detail, sku_detail

Query

The table above shows a comparison of the profit values ​​for 2021 and 2022 for each category with a comparison value of 0.244%.

Nomor 9

Q: Tampilkan top 5 SKU dengan kontribusi profit paling tinggi di tahun 2022 berdasarkan kategori paling besar pertumbuhan profit dari 2021 ke 2022 (berdasarkan hasil no 8).

Gunakan is_valid = 1 untuk memfilter data transaksi.

Source table: order_detail, sku_detail

Query

The table above shows the top 5 SKUs with the highest profit contribution in 2022 with profit growth from 2021 to 2022.

Nomor 10

Q: Tampilkan jumlah unique order yang menggunakan top 5 metode pembayaran (soal no 6) berdasarkan kategori produk selama tahun 2022.

Gunakan is_valid = 1 untuk memfilter data transaksi.

Source table: order_detail, sku_detail

Query

The table above shows the number of unique orders using the top 5 payment methods based on product category during 2022 where Men Fashion is the highest product category on the COD payment method, Mobiles & Tablets is the superior product category on the Easypay, Payaxis and Customercredit and Superstore being the highest product category on the jazzwallet payment method.

--

--