Day 32 (week 7) — Inner and Outer Joins

Today will be focused on performing inner and outer joins in SQL. First, a join is a way of combing data from two tables that have a relationship to one another. To set up our examples, I will be using a users table and a posts table. Here are the SQL definitions.

create table users (
id SERIAL PRIMARY KEY,
name VARCHAR(30));
create table posts (
id SERIAL PRIMARY KEY,
title VARCHAR(30),
body VARCHAR(100),
userid int references users(id));

Each user gets their own unique id and a name. Each post gets a unique id, a title, a body, and an id that references back to the users table. The relationship is that one user can have multiple posts, but one post can have at most one user as its author.

After inserting four users, we have:

 id | name
----+-------
1 | eric
2 | bob
3 | david
4 | chris
(4 rows)

And we can author some posts.

 id | title  |  body  | userid
----+--------+--------+--------
1 | post 1 | body 1 | 1
2 | post 2 | body 2 | 2
3 | post 3 | body 3 | 2

In the posts table, you can see that bob has authored two posts, eric has authored one post, and david and chris have authored 0 posts.

Let’s use the one-to-many relationship we have created between users and posts to do some queries.

First, let’s blend the data from the two tables together. We might like to see all the post titles and bodies along with their authors.

select p.title, p.body, u.name from posts as p, users as u where p.userid = u.id;
 title  |  body  | name
--------+--------+------
post 1 | body 1 | eric
post 2 | body 2 | bob
post 3 | body 3 | bob

This is an inner join. We are restricting the data from both tables to show us only the individual posts where the id field matches up.

So what does an outer join look like? An outer join will allow us to include more data in the result.

select p.title, p.body, u.name from users u left join posts p on p.userid = u.id;
 title  |  body  | name
--------+--------+-------
post 1 | body 1 | eric
post 2 | body 2 | bob
post 3 | body 3 | bob
[null] | [null] | chris
[null] | [null] | david

Which shows the same three posts authored by eric and bob, but also shows us that there are some users (chris and david) who have not authored any posts. This is reflected by the [null] entries under title and body.

What I think matters most here is the order. The left join will return all records from the left table (users) and the matched records from the right table (posts). In this case, because the table on the left is users, we start with *all* of the users and then look for matches in the right table.

If we flip the order, we get a different result which should make sense.

select p.title, p.body, u.name from posts p left join users u on p.userid = u.id;
 title  |  body  | name
--------+--------+------
post 1 | body 1 | eric
post 2 | body 2 | bob
post 3 | body 3 | bob

Here, we start with the left table (posts) and then find matching records in the users table. Because there are only three posts with authors of bob and eric, we never see the extra data from the users table because there is no matching record in posts with those user ids.

Last, it is fairly common (I think) for developers to just always use left outer joins even though left and right outer joins do the same thing but in a different order.

I found this graphic online that helps to visualize what’s happening with inner joins and outer joins.

In words, you can think of it this way:

An inner join will return records that have matching values in both tables.

A left outer join will return all records from the left table and the matched records from the right table.

A right outer join will return all records from the right table and the matched records from the left table.

A full outer join will return all records when there is a match from either table.