Making the Most of Your Data: An Introduction to SQL Joins

Dave Russell
FlexMR Dev Blog
Published in
6 min readJun 30, 2020

Joins are the key feature of relational databases. They allow you to tap into information across tables by exploiting their relationships. They’re a way of returning a new set of data made up of columns from multiple tables (or just one in the case of a self join).

We use joins extensively at FlexMR, for example, our authorisation system, our algorithms for matching participants to research tasks, for extracting data for use in our analysis tools, and several of our GDPR compliance processes all involve intricate joins.

Using object relational mapping (ORM) tools can negate a lot of the need to write joins by hand, but there are times where the ORM doesn’t directly support what you’re trying to do; or, it does support it but does so in a sub-optimal way.

The ability to write your own Structured Query Language (SQL) queries is a valuable skill for software developers that work with relational databases.

Following Along

The example queries in this article make use of the excellent Chinook database. This is a ready-made schema and dataset that can be imported into a range of different databases.

You can import this into the database of your choice, and experiment with these queries yourself. Using the SQLite version with DB Browser for SQLite is a quick way to get going, however SQLite does not support RIGHT JOIN.

Joining the Tables

Joins are about combining columns based on matching data between tables. For example, if you wanted to retrieve a list of Album rows with each Track you would join on the AlbumId column that is in both tables.

The type of join to use depends on which table’s data is most relevant to you, and how you want the results to be returned.

Note: the terms ‘Left’ and ‘Right’ refer to the tables in your queries. You SELECT FROM the ‘Left’ table, and JOIN the ‘Right’ table.

INNER JOIN

An ‘INNER JOIN’ will return rows from both the Left and Right tables, but only when there is a match between them. They are only used when you need data from both tables as no data will be returned unless there is related data between them.

In this example, we want to see a set made up of Album rows and their corresponding Track rows.

This returns:

Notice that there is a returned row for each Track row. If there were no Track records for a given AlbumId then that Album would not appear in these results. There must be a match between the two tables for an INNER JOIN to return anything.

LEFT (OUTER) JOIN

One quick note: there is no difference between a LEFT JOIN and a LEFT OUTER JOIN. You can use either interchangeably. If you’re mixing different joins in a single query you might find it helps readability to include OUTER , but it is not required.

A LEFT JOIN will return all records from the Left table, regardless of if there are any matching rows in the Right table. Any columns you SELECT from the Right table that don’t have any data will be returned as NULL.

The Playlist table makes for a good example. There is a one-to-many relationship between Playlist and PlaylistTrack , although not all of the Playlist rows have corresponding PlaylistTrack rows.

In this query, we SELECT every column from both tables.

This returns 8719 rows, of which the first 20 look like this:

As we are more interested in the Playlist records, we can filter this down to returning just one row per Playlist by adding a GROUP BY.

Which returns each Playlist along with the first matched PlaylistTrack record.

Note the NULL results for the two ‘Movies’ and ‘Audiobooks’ rows.

This is good data but, with a few tweaks, we can turn it into great information. In this next query, we’re going to:

  • Show the name of the first track in the playlist with an additional join against the Track table.
  • Show how many tracks are in each playlist by using COUNT
  • Not show any of the Id columns by tweaking the initial SELECT

Which returns the following:

RIGHT (OUTER) JOIN

A RIGHT JOIN will return all rows from the Right table. As with a LEFT JOIN any missing data is returned as NULL.

In practice, this type of join is rarely used. A RIGHT JOIN can almost always be replaced with a LEFT JOIN by reversing the order of the tables. For example,

SELECT * FROM Playlist LEFT JOIN PlaylistTrack

Is functionally the same as:

SELECT * FROM PlaylistTrack RIGHT JOIN Playlist

To keep your SQL portable between databases, it is recommended to use LEFT JOIN rather than RIGHT JOIN.

Self JOIN

A Self JOIN is used when you want to create a data set comprised of data from more than one row on a single table.

When using a self join, you must set and use an alias for each reference to the table.

The Employee table lists each employee and has the key ReportsTo to reference the row of their boss. By using a self join we can list each employee along with the name of their boss.

This returns the following:

Unfortunately, ‘Andrew Adams’ is missing from the data. This is because he does not have a boss (he’s the General Manager) and so he was excluded from the results of the INNER JOIN.

We can change that INNER JOIN to a LEFT JOIN to ensure he is included. To avoid a NULL value in the ‘Boss’ column, we can use a COALESCE to provide a more appropriate value. To make the data easier to read, we can order the results by the ReportsTo column.

Self joins can be used together with joins to other tables. This next query expands on the previous one by joining to the Customer table to get the number of customers each employee has. As it’s an INNER JOIN , only employees with customers will be returned. Swapping this for a LEFT JOIN would ensure that all Employee rows are returned, with 0 as their ‘Number of Customers’ value.

Knowing how and when to use LEFT, INNER, and Self Joins grants you a lot of power to exploit the data in your database. There are other join types to explore, such as CROSS and FULL joins; but they have much narrower use-cases and aren’t always supported across different databases.

With a bit of practice, joining tables together to find the data that you want becomes intuitive. Once you’ve mastered joins, you can combine them with other database features and functions to extract really interesting information from your data.

--

--

Dave Russell
FlexMR Dev Blog

Possibly the nation’s favourite Rails developer. Diabetic AF, Mental Health advocate, 100% Capricorn, ISTJ. Enjoys Git a bit too much. Instagram is life.