Learning Journey in SQL

Joining A to SQL: Using Joins and Normalization To Advance Your Queries and Data Analysis

Martyna Adam
Learning SQL
Published in
7 min readOct 21, 2022

--

Welcome to the next installment of the “From A to SQL” series. In previous chapters of the learning journey, we covered databases, setting primary and foreign keys, and using operators and set functions for data analysis. If you need a refresher or would like to cover them in more detail, you can access Part 4 here. In this chapter, we will be covering how to write one query which can return information from multiple tables, using JOINS.

Photo by Kaitlyn Baker on Unsplash

In the most basic form, a JOIN combines two or more tables into a single result which are related by a column (think foreign keys). However, you may be wondering why you can’t just put everything in one super table and save yourself the trouble of writing queries to join different tables together. There are two main reasons:

1. Stops duplication of information. If you had one table with multiple orders from the same individual, for every order you would have to type out the same customer information. This may be a lengthy process, especially if you want to know a lot of customer details like their address, phone number, email address etc.

2. Minimised issues with Data Integrity. Following on from our example, if you had one table with multiple orders from the same individual, and made a mistake in the phone number for one of the rows, how would you know which phone number was the correct one?

By having this information in one table, and then making relationships to other tables, you only have to update the information one time, which saves both time and effort.

What is Normalization?

Designing a database in a way that avoids redundancy of information and minimises issues with data integrity is called “normalization”. Edgar Codd came up with normalization rules to follow, and the first three rules are the ones you should focus on.

First Normal Form (1NF)

This means a database must not have any repeating groups and a cell should have a single value.

Second Normal Form (2NF)

This means a database must meet all the rules of 1NF and must not have any information which is not dependent on the primary key. Foreign keys should be set to define relationships between tables.

Third Normal Form (3NF)

This means a database must meet all the rules of the 2NF and have no transitive functional dependency. Transitive functional dependency means having no indirect relationship between values in the same table.

For a more comprehensive overview of Normalization, here is a great resource.

Now that we understand why we must have multiple tables in a database, let’s learn the different types of Joins which you can use to combine tables together and gather the information you require.

SQL JOINS

There are many types of JOINs in SQL. In this chapter, we will be focusing on 3 types: inner join, left/right join and self-join.

Inner Join

An inner join selects all the records that have matching values in both tables. The join is based on a common column between the tables.

Inner Join Visual

Here is the syntax for an inner join:

SELECT
<alias1>.<column_name>,
<alias2>.<column_name>
FROM <table1_name> <alias1>
INNER JOIN <table2_name> <alias2>
ON <alias1>.<column_name> = >, <alias2>.<column_name>;

Left/Right Join

This type of join will return all records from the specified side, and any matching records from the other side. For example, a left join will return all records from the left side table, as well as any matching records from the right side table. The join is based on a common column between the tables.

If there are no matching records from the right table, the result will be a NULL value. Following the same logic, a right join will return all records from the right side table as well as any matching records from the left side table. If there are no matching records from the left side table, the result will be a NULL value.

Left Join Visualisation

Here is the syntax for a left/right join:

SELECT <alias1>.<column_name>, <alias2>.<column_name>
FROM <table1_name> <alias1>
LEFT JOIN <table2_name> <alias2>
ON <alias1>.<column_name> = <alias2>.<column_name>;
SELECT <alias1>.<column_name>, <alias2>.<column_name>
FROM <table1_name> <alias1>
RIGHT JOIN <table2_name> <alias2>
ON <alias1>.<column_name> = >, <alias2>.<column_name>;

Self Join

This type of join will join a table with itself, resulting in a combination of data from the same table. You can still use the keywords INNER JOIN or LEFT JOIN/RIGHT JOIN to do a self join. The only difference is that both the right hand and left hand tables will be the same. Due to this, the proper use of aliases is imperative! Self-join is frequently used with hierarchical data, for example when you have a table with employee and manager information.

Self Join Visualisation

Here is an example syntax for a self join:

SELECT <alias1>.<column_name>, <alias2>.<column_name>
FROM <table1_name> <alias1>
LEFT JOIN <table1_name> <alias2>
ON <alias1>.<column_name> = <alias2>.<column_name>;

Now, let’s use what we learnt in an example.

Ready? Set? INNER JOIN.

As usual, our example will focus on a business database we have been building throughout the journey. You can refer to Part 2 to see how we started off.

Currently, our database has 4 tables: customer, item, orders, and supply. Let’s imagine we want to find out more details about the customers who have placed orders with the business.

SELECT o.C_ID, o.Sales_Amount, c.First_Name, c.Last_Name, c.Email
FROM orders AS o
INNER JOIN customer AS c
ON o.C_ID = c.C_ID;

This will give us a result set that shows us the order ID and the Sales Amount (from the Orders table) combined with the customer first name, last name, and email address (from the Customer table).

We can use this syntax to order the results in a particular way. For example, we may want to have the customers last name in an alphabetical order.

SELECT o.C_ID, o.SalesAmount, c.C_Name, c.Surname, c.Email_Address
FROM orders AS o
INNER JOIN customer AS c
ON o.C_ID = c.C_ID
ORDER BY c.Surname ASC;

Or we may want to have the highest sales amount first.

SELECT o.C_ID, o.SalesAmount, c.C_Name, c.Surname, c.Email_Address
FROM orders AS o
INNER JOIN customer AS c
ON o.C_ID = c.C_ID
ORDER BY c.Surname ASC;

We may also need to know how much stock we currently have for each item. Remember, the item name is stored in the Item table, while the quantity is stored in the Supply table. We can combine them to give us the result in a single set.

SELECT i.I_Name, s.Quantity
FROM item AS i
LEFT JOIN supply AS s on i.I_ID = s.I_ID;

Like previously shown, we can also add other keywords such as WHERE to gather specific information which we may need. For example, we may need to find out the name and quantity of an item which is in London.

SELECT i.I_Name, s.Quantity
FROM item AS i
LEFT JOIN supply AS s on i.I_ID = s.I_ID
WHERE s.Location = 'London';

Now, to demonstrate a self-join, we will create a new table and populate it with some employee information.

CREATE TABLE employees
(E_ID char(2) PRIMARY KEY,
First_Name varchar(50),
Last_Name varchar(50),
Role_Title varchar(50),
Manager_ID char(2),
Salary DECIMAL(8,2));
INSERT INTO employees
VALUES
('E1', 'James', 'Taylor', 'Senior Sales', 'E3', 35000),
('E2', 'Linda', 'Evans', 'Sales', 'E1', 25000),
('E3', 'Jennifer', 'Thomas', 'Manager', '', 45000),
('E4', 'Daniel', 'Jones', 'Sales', 'E1', 25000);

As you can see, most employees have a manager ID, which refers to the same Employee ID. As this is a small and simple table, it is easy to see that Lindas and Daniels manager is James, and James’ manager is Jennifer. However, what if you had a company with 600 employees and various managerial roles? You can use a self-join to help you gather this information.

SELECT
e1.First_Name,
e1.Last_Name,
e1.Role_Title,
e2.First_Name,
e2.Last_Name,
e2.Role_Title
FROM employees as e1
LEFT JOIN employees as e2
ON e1.Manager_ID = e2.E_ID;

As we have used a LEFT JOIN, we have information for all the employees. Jennifer has no manager, therefore there are NULL values next to the manager columns. This gives us the correct result set, however notice anything that could be an issue?

We are using the same table, so the original column names could make it more difficult to understand the result. We can use aliases and the CONCAT function for better readability of the results.

SELECT
CONCAT(e1.First_Name, " ", e1.Last_Name) AS 'Employee Name',
e1.Role_Title AS 'Employee Role Title',
CONCAT(e2.First_Name," ", e2.Last_Name) AS 'Manager Name',
e2.Role_Title AS 'Manager Role Title'
FROM employees as e1
LEFT JOIN employees as e2
ON e2.E_ID = e1.Manager_ID;

Changing the query slightly can have a much better impact on the result of the query. Remember, the columns in the database remain unchanged, but a colleague who looks at this result will be able to understand it much better and draw quicker insights.

Congratulations! You are one step closer to advancing your SQL skills. You can make these queries as easy or as complex as you need to. The important thing to remember is to understand how two tables are related and use aliases correctly to allow you to query the database successfully.

Keep your eyes peeled for the next sequel of my learning journey :)

Go from SELECT * to interview-worthy project. Get our free 5-page guide.

--

--