SQL Series: Part 2 — SQL Joins with Examples (INNER / LEFT / RIGHT / FULL OUTER / SELF / CROSS)
Check Out Part 1 of the SQL Series:
What is SQL JOIN?
Definition: A clause used to combine records of two or more tables based on related columns
Basic JOIN types: INNER, LEFT, RIGHT, FULL OUTER
To better understand and visualize how each JOIN type works in the real world, the following 2 tables are created for illustration:
Customer Table
+---------+---------------+
| cust_id | register_date |
+---------+---------------+
| 1001 | 2024-01-01 |
| 1002 | 2024-03-15 |
| 1003 | 2024-03-16 |
| 1004 | 2024-05-04 |
| 1005 | 2024-07-31 |
+---------+---------------+
Orders Table
+----------+------------------+------------+---------+
| order_id | transaction_date | net_amount | cust_id |
+----------+------------------+------------+---------+
| 151 | 2024-02-24 | 67.5 | 1001 |
| 199 | 2024-02-27 | 48 | 1001 |
| 213 | 2024-04-13 | 13 | 1002 |
| 311 | 2024-05-04 | 90.3 | 1003 |
| 425 | 2024-05-10 | 17.5 | 1003 |
+----------+------------------+------------+---------+
Type 1: Inner Join
Concept: Only return rows with MATCHING records in both tables
SELECT
c.cust_id,
SUM(net_amount) AS total_spending
FROM Customer c
INNER JOIN Orders o
ON c.cust_id = o.cust_id
GROUP BY c.cust_id
Result Table:
+---------+----------------+
| cust_id | total_spending |
+---------+----------------+
| 1001 | 115.5 |
| 1002 | 13.0 |
| 1003 | 107.8 |
+---------+----------------+
As you can see, cust_id "1004" and "1005" are not included since these 2
customers only exist in the customers table but not the orders table.
Type 1: Left Join
Concept: Return ALL records from the left table and only MATCHING records from the right table
SELECT
c.cust_id,
SUM(net_amount) AS total_spending
FROM Customer c
LEFT JOIN Orders o
ON c.cust_id = o.cust_id
GROUP BY c.cust_id
Result Table:
+---------+----------------+
| cust_id | total_spending |
+---------+----------------+
| 1001 | 115.5 |
| 1002 | 13.0 |
| 1003 | 107.8 |
| 1004 | NULL |
| 1005 | NULL |
+---------+----------------+
As you can see, all cust_id from the Customer table are returned but the
total spending for cust_id "1004" and "1005" are NULL since they did not make
any orders.
Type 3: RIGHT Join
Concept: Return ALL records from the right table and only MATCHING records from the left table
SELECT
c.cust_id,
SUM(net_amount) AS total_spending
FROM Customer c
RIGHT JOIN Orders o
ON c.cust_id = o.cust_id
GROUP BY c.cust_id
Result Table:
+---------+----------------+
| cust_id | total_spending |
+---------+----------------+
| 1001 | 115.5 |
| 1002 | 13.0 |
| 1003 | 107.8 |
+---------+----------------+
As you can see, cust_id "1004" and "1005" are not included since these 2
customers did not appear in the orders table and only matching records from
the Customer table are returned in RIGHT join.
Fun Fact: Never used RIGHT join in my work 🤣
Type 4: FULL OUTER Join
Concept: Returns ALL rows from both tables, regardless of whether there is a match or not
SELECT
*
FROM Customer c
FULL OUTER JOIN Orders o
ON c.cust_id = o.cust_id
GROUP BY c.cust_id
+---------+---------------+----------+------------------+------------+
| cust_id | register_date | order_id | transaction_date | net_amount |
+---------+---------------+----------+------------------+------------+
| 1001 | 2024-01-01 | 151 | 2024-02-24 | 67.5 |
| 1001 | 2024-01-01 | 199 | 2024-02-27 | 48.0 |
| 1002 | 2024-03-15 | 213 | 2024-04-13 | 13.0 |
| 1003 | 2024-03-16 | 311 | 2024-05-04 | 90.3 |
| 1003 | 2024-03-16 | 425 | 2024-05-10 | 17.5 |
| 1004 | 2024-05-04 | NULL | NULL | NULL |
| 1005 | 2024-07-31 | NULL | NULL | NULL |
+---------+---------------+----------+------------------+------------+
As you can see, all records from both tables are returned. NULL values
indicate that there is no matching data for those specific fields.
Note: Run time will be long especially for large tables
Other Common Types of Joins:
Type 5: SELF Join
Concept: A join type where the table joins with itself
SELECT
o1.cust_id,
o1.net_amount AS cust_first_order,
o2.net_amount AS cust_sec_order
FROM cust_order.orders o1
INNER JOIN cust_order.orders o2
ON o1.cust_id = o2.cust_id
AND o1.net_amount < o2.net_amount
Result Table:
+---------+------------------+----------------+
| cust_id | cust_first_order | cust_sec_order |
+---------+------------------+----------------+
| 1001 | 48.0 | 67.5 |
| 1003 | 17.5 | 90.3 |
+---------+------------------+----------------+
Here, we are looking for customers whose net amount of their first order is
less than the net amount of their second order.
Type 6: CROSS Join (Cartesian Product)
Concept: Returns ALL possible combination among the tables
SELECT
c.cust_id,
o.order_id
FROM cust_order.customers c
CROSS JOIN cust_order.orders o
ORDER BY c.cust_id
Result Table:
+---------+----------+
| cust_id | order_id |
+---------+----------+
| 1001 | 151 |
| 1001 | 199 |
| 1001 | 213 |
| 1001 | 311 |
| 1001 | 425 |
| 1002 | 151 |
| 1002 | 199 |
| 1002 | 213 |
| 1002 | 311 |
| 1002 | 425 |
| 1003 | 151 |
| 1003 | 199 |
| 1003 | 213 |
|.... and more ......|
+---------+----------+
Here, all cust_id and order_id combinations are returned.