Joining Constellations with SQL

A tutorial on intermediate SQL Queries

Karen Warmbein
The Startup
5 min readMay 29, 2020

--

Image credit: CESAR @ ESA

In the mystical moist night-air, and from time to time, Look’d up in perfect silence at the stars.” — Walt Whitman

When was the last time you looked up at the twilight sky? Ever count the stars as they become visible? One, two, six… suddenly there are too many to count. It’s easy to start looking for patterns in the stars. You might have found common asterisms like the Big and Little Dippers or the huge Summer Triangle. Maybe you’ve recognized some constellations like Cassiopeia, which appears like a giant ‘W’ in the sky.

I could write a lot about the constellations and objects that you can see in the night sky, but I digress. This post is about SQL queries — specifically different JOIN statements. Let’s begin.

The Data Tables

Instead of querying one table, we are going to be using two tables — circ_consts and sum_consts. The table circ_consts has information for five constellations that are in the circumpolar region of the sky. It has two columns:

  • const_name — the name of the circumpolar constellation
  • area — which represents the area of the sky that the boundaries of the constellations enclose, in degrees

Using a SELECT query, we can view the table:

The circ_consts table.

The other table, sum_consts, has information for six constellations that can be seen after sunset in the (Nothern Hemisphere’s) summer. It also has two columns:

  • const_name — the name of the summer constellation
  • main_stars — how many main stars make up the constellation. These are (usually) the brightest stars in the constellation and are connected in images of the stick figures.

Again, using a SELECT query, we can quickly view the table:

The sum_consts table.

According to W3Schools, a JOIN clause is used to combine rows from two (or more tables), based on a related column between them. Looking at the two tables each has a column namedconst_name, and there are some names that are the same. This is the related column between the tables. I want to note that in practice, tables that we want to combine DO NOT NECESSARILY have the same column names. In fact, they rarely do. In this case, we need to rely on a map called a database table schema. If that isn’t available, look for a data dictionary or someone with experience working in the database.

Full Outer Join

Consider the following. Say we want to join both tables, and all the data in them, into one table. We use a FULL OUTER JOIN for this.

SELECT *
FROM circ_consts AS c
FULL OUTER JOIN sum_consts AS s ON c.const_name = s.const_name;

Result:

A full outer join.

See the blank spaces in the image above? The blank spaces represent NULL values. A NULL value is different from a zero value or a field that contains spaces. Here, it literally means that the field was missing when the two tables were combined.

A lot of folks learn SQL JOIN statement’s with Venn Diagrams. So, for historical purposes, I will add Venn Diagrams. First, look at a diagram with the entire sets — both tables. The circle on the left represents circ_consts (the five circumpolar constellations) and the circle on the right represents the table with six summer constellations, sum_consts. The two constellations in the middle are shared constellations, that is, they are both circumpolar constellations and constellations you can view in the summer months.

The Venn Diagram for both tables.

This is the Venn Diagram for the FULL OUTER JOIN. Notice how both circles are shaded?

The Venn Diagram for the Full Outer Join.

Inner Join

Next up: the Inner Join. We want to address those two constellations that appear in both tables — Cassiopeia and Ursa Major. Here is the diagram. We want the information and data about the constellations highlighted in the middle.

The Venn Diagram for the Inner Join.

The SQL query is:

SELECT *
FROM circ_consts AS c
INNER JOIN sum_consts AS s ON c.const_name = s.const_name;

Result:

An inner join on the two tables.

The join merged two tables with common constellation names. Notice that we have repeating information const_name? We can refine the query and eliminate the duplicate information by specifying which columns we want in this table view.

SELECT c.const_name, area, main_stars
FROM circ_consts AS c
INNER JOIN sum_consts AS s ON c.const_name = s.const_name;

Result:

A different view of the same information.

Left Outer Join

The last join I want to introduce to you is the LEFT OUTER JOIN. This join returns all the rows of the table on the left side of the join — also referred to as the left table — and matching rows for the table on the right side of the join. In terms of our Venn Diagrams, this is the area of the left circle (including the middle common area).

So, what should we expect from this query? Three things:

  • All the information from the left table, circ_consts, which includes information from the const_name and area.
  • The common constellations will have three points of data: area, main_stars, and const_name.
  • The rows for which there is no matching row on the right side will contain NULL. These are the constellations in circ_consts.

Let’s see this in action.

SELECT c.const_name, area, main_stars
FROM circ_consts AS c
LEFT OUTER JOIN sum_consts AS s ON c.const_name = s.const_name;

Result:

A left outer join.

Again, the blank spaces are NULL values.

Full outer join, inner join, and left outer join — I hope I’ve demystified these concepts or, at least, added some examples to your toolset.

For your reference, I have a series of basic SQL tutorials. Take a look!

As always, ask questions and post comments below. Let’s chat!

--

--