Intermediate SQL (Part 1)

Faisal Afif
Data Science Indonesia
9 min readNov 21, 2023

A beginner guide for learning SQL from scratch

Photo by Tony Reid on Unsplash

So far we have already learned some basic SQL queries, such as SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, and LIMIT. If you would like to review them, please visit this Introduction to SQL series.

The intermediate SQL series will cover the following topics:

  • Combining multiple tables using JOINs and set operations
  • Performing window functions in SQL
  • Differentiating between subqueries, common table expressions (CTEs), views and temporary tables
  • Modifying rows and tables

In this section, we will explore JOINs and set operations. JOINs are essential for combining data from multiple tables to answer complex questions. We will start with the basic JOINs and then progress to advanced JOINs techniques, and last but not least we will explore set operations in SQL.

Before we dive into the different types of JOINs and how to use them, let’s review the principles. In SQL, a JOIN is an operation that combines rows from two or more tables based on a related column between them. This relationship is often established through primary keys and foreign keys.

A primary key is a field in a table that uniquely identifies each record in that table. On the other hand, a foreign key is a field in a table that references the primary key of another table. This relationship is a fundamental aspect of a relational database.

To illustrate, consider these two tables: Movies and BoxOffice. In the Movies table, the Id field serves as the primary key, uniquely identifying each movie record. In the BoxOffice table, the Movie_id field acts as a foreign key, referencing the Id field in the Movies table. This relationship enables the JOIN operation to connect relevant information from both tables.

SQL offers several types of JOINs, each with a unique purpose. These are several types of JOINs.

1. INNER JOIN

An INNER JOIN between two tables returns only records where the JOIN condition matches in both tables. To illustrate, consider the Movies and BoxOffice tables from the SQL Bolt. When performing an INNER JOIN, the resulting data will include only those records where the id in the Movies table matches the movie_id in the BoxOffice table.

-- Inner Join between Movies table and Box Office table
-- Primary key in Movies table = id
-- Foreign key in BoxOffice table = movie_id
SELECT
*
FROM
movies AS m
INNER JOIN
boxoffice AS bo
ON m.id=bo.movie_id;

2. LEFT JOIN and RIGHT JOIN

A LEFT JOIN keeps all the original records from the left table and includes missing values (NULL) for any columns from the right table where the joining field did not find a match. On the other hand, a RIGHT JOIN keeps all the original records from the right table and includes missing values (NULL) for any columns from the left table where the joining field did not find a match. RIGHT JOIN is far less common than LEFT JOIN, because RIGHT JOIN can always be rewritten as LEFT JOIN.

For example, Buildings and Employess tables below from the SQL Bolt. When performing a LEFT JOIN, the resulting data will include all the records from Buildings table and includes missing values (NULL) when Building_name column did not have any match with Building column from Employees table.

-- Left Join between Building table and Employees table
-- Primary key in Building table = building_name
-- Foreign key in Employees table = building
SELECT
*
FROM
buildings AS b
LEFT JOIN
employees AS e
ON b.building_name=e.building;

-- Right Join between Building table and Employees table (returns all records from employees table)
-- Primary key in Building table = building_name
-- Foreign key in Employees table = building
SELECT
*
FROM
buildings AS b
RIGHT JOIN
employees AS e
ON b.building_name=e.building;

3. FULL JOIN

A FULL JOIN is a combination of LEFT JOIN and RIGHT JOIN. It returns all records from both tables involved, regardless of whether there is a match on the joining field in the other table. In cases where a match is not found, the result will include NULL values for the columns from the table without a matching record.

-- Full Join between Building table and Employees table
-- Primary key in Building table = building_name
-- Foreign key in Employees table = building
SELECT
*
FROM
buildings AS b
FULL OUTER JOIN
employees As e
ON b.building_name=e.building;

4. SELF JOIN

The SELF JOIN, as its name implies, is used to join a table to itself. This means that each row in a table is joined to itself and every other row in that table. However, referencing the same table more than once within a single query will result in an error. To avoid this, in SELF JOIN aliases are used. Either LEFT JOIN or INNER JOIN is commonly used to perform a SELF JOIN

-- Syntax for Self Join
SELECT
column_names
FROM
table_name AS t1
[INNER | LEFT] JOIN table_name AS t2
ON join_predicate;


-- Self Join of Movies table where the same Director has created two movies and Year of Movie 1 < Year of Movie 2

SELECT
m1.title AS title_1,
m2.title AS title_2,
m1.year AS year_1,
m2.year AS year_2,
m1.director
FROM
movies AS m1
INNER JOIN
movies AS m2
ON m1.director=m2.director
AND m1.year<m2.year;
Figure 3. Movies Table (from SQL Bolt)
Figure 4. Result of Self Join (Example by using Movies table)

5. CROSS JOIN

The CROSS JOIN in SQL is commonly used to generate all combinations of records in two tables. For example, you have two columns: car_model and color_name, and you need a result set to display all the possible paired combinations of those (cartesian product).

Figure 5. Combining two tables with Cross Join
-- Syntax for Cross Join
SELECT
column_1, column_2
FROM
table_1
CROSS JOIN
table_2;

-- Cross Join of car_model from Cars table and color_name from Colors table
SELECT
c.Car_model,
c1.Color_name
FROM
Cars AS c
CROSS JOIN
Colors AS c1;

6. SET OPERATIONS

Set operations in SQL is a type of operations which allow the results of multiple queries to be combined into a single result set. Set operators in SQL include UNION, UNION ALL, INTERSECT, and EXCEPT, which correspond to the concepts of union, intersection and set difference in mathematics.

To combine two or more queries into a single result set, they must follow these requirements:

  • The result sets of both queries must have the same number of columns.
  • The corresponding columns in the two queries must have the same data type or must be implicitly convertible to the same data type.
Figure 7. Illustration of set operations (Image by author)

6.1. UNION and UNION ALL

When working with multiple tables, UNION and UNION ALL operators allow you to combine the results of two or more SELECT statements into a single result set. The UNION operator removes duplicate rows between the tables by default, while the UNION ALL operator does not.

-- Syntax for UNION
SELECT
column_1, column_2
FROM
table_1
UNION
SELECT
column_1, column_2
FROM
table2;

-- Syntax for UNION ALL
SELECT
column_1, column_2
FROM
table_1
UNION ALL
SELECT
column_1, column_2
FROM
table2;

Here is a comparison of JOIN, UNION and UNION ALL to get clear understanding of their differences.

Figure 8. Difference between JOIN, UNION and UNION ALL (Image by author)

6.2. INTERSECT

The INTERSECT operator takes the results of two queries and returns only rows that are identical in both result sets. The INTERSECT operator removes duplicate rows from the final result set. Whereas a UNION operator is a logical OR, INTERSECT is a logical AND.

-- Syntax for INTERSECT
SELECT
column_1, column_2
FROM
table_1
INTERSECT
SELECT
column_1, column_2
FROM
table2;

6.3. EXCEPT

The EXCEPT operator returns only those rows from the first result set that are not in the second result set. This means that the EXCEPT operator is query order-sensitive, like the LEFT JOIN and RIGHT JOIN. In some SQL flavors, they provide MINUS operator which is functionally equivalent to the EXCEPT.

-- Syntax for EXCEPT
SELECT
column_1, column_2
FROM
table_1
EXCEPT
SELECT
column_1, column_2
FROM
table2;

When combining multiple queries by using UNION, UNION ALL, INTERSECT and EXCEPT, you can add another clause like ORDER BY and LIMIT.

Figure 9. Summary of SET OPERATIONS from DataCamp SQL JOINS Cheat Sheet

7. SEMI JOIN and ANTI JOIN

A SEMI JOIN returns rows from the first table where a condition is met in the second table, but it does not include the actual columns from the second table in the result set. It uses a WHERE clause and a simple subquery to filter the first table based on a specific condition. Whereas an ANTI JOIN returns rows from the first table where a condition is NOT met in the second table. It also uses a WHERE clause and simple subquery.

For example, we want to get Building_name and Capacity from Buildings table where

  1. Building is found in the Employees table
  2. Building is not found in the Employees table
-- To get data in the Buildings table where Building is found in the Employee table
SELECT
*
FROM
Buildings
WHERE
Building_name IN
(SELECT
Building
FROM
Employees)
;

-- Data in the Buildings table where Building is not found in the Employee table
SELECT
*
FROM
Buildings
WHERE
Building_name NOT IN
(
SELECT
Building
FROM
Employees);

So far, we’ve already learned how to use INNER JOIN, LEFT JOIN, RIGHT JOIN, SELF JOIN, and CROSS JOIN to combine multiple tables based on a related column between them. We’ve learned set operations such as UNION, UNION ALL, INTERSECT, EXECPT and how to use them. And last but not least we’ve learned simple subquery in SEMI JOIN and ANTI JOIN. See you in the next section of a beginner guide for learning SQL from scratch.

This is part of the Intermediate SQL series, here is the list of Intermediate SQL series.

2. Inner Join in SQL Bolt

3. Outer Joins in SQL Bolt

4. Understanding SQL Server SELF JOIN By Practical Examples

5. Understanding SQL CROSS JOIN

6. Set Operations in SQL

7. Set Operations in MySQL

8. Combine tables with UNION, INTERSECT, and EXCEPT

9. MySQL UNION Explained

--

--